Architecture and work of LSTM
The LSTM architecture has a neural network consisting of four and is calledCellsChain structure of different memory blocks.
The information is retained by the cell, and the memory operation is done by the door. There are three gates – the Forgetting Gate, the Input Gate, and the Output Gate.
Oblivion Door
Information that is no longer useful in the unit state is removed through the Oblivion Gate. The two inputs, XT (the input at a particular time) and HT-1 (the previous cell output), are fed to the gate and multiplied with the weight matrix, followed by the offset. The result is through an activation function that gives a binary output. If for a particular cell state, the output is 0, the message is forgotten, and for output 1, the message is retained for future use. The equation for the Forgotten Gate is:
f_t=σ(w_f·[h,x_t]+b_f)
w f represents the weight matrix associated with the forgetting gate.
h t-1, x t] represents the cascade of the current input and the previously hidden state.
bf is a deviation with a forgotten gate.
is a sigmoid activation function.
Enter the door
Adding useful information to the cell state is done by entering the gate. First, the information is conditioned using the sigmoid function, and similar to using the forgetting gate with the input HT-1 and XT to filter the values to be remembered. Then, use the tanh function to create a vector that gives an output from -1 to +1 with all possible values of ht-1 and xt. Finally, multiply the value of the vector by the moderated value to get useful information. The equation for the input gate is:
i_t=σ(w_i·[h,x_t]+b_i)
ĉ_t=tanh(w_c·[h,x_t]+b_c)
We multiply the previous state by ft, ignoring the information we previously chose to ignore. Next, we include it in *ct. This represents the candidate value for the update, adjusted based on the amount we choose to update each status value.
c_t=f_t⊙c_+i_t⊙
Represents element multiplication.
tanh is the tanh activation function.
Output gates
The task of extracting useful information from the current cell state as output is done by the output gate. First, the vector is generated by applying the tanh function on the unit. Then, use the sigmoid function to adjust the information, and use the inputs ht-1 and xt to filter by the values to be remembered. Finally, the values of the vector and the modulation are multiplied and sent as output to the next cell and input. The equation for the output gate is:
o_t=σ(w_o·[h,x_t]+b_o)