Attentional Sequence to Sequence

Put your Vanilla code here

Attentional Sequence to Sequence

Postby hbyte » Sat Dec 07, 2024 7:36 pm

Attention used in transformers was born from Attentional Seq2Seq

H = Latent hidden vector used to predict word(i)
S = Output for Jth input word(j)
X = Input
C = Context
Attention Rij = Hi-1 * Sj

a(ij) = Softmax(Rij)

Context(i) = Sum{a(ij) * S(j)

Output(i) = Decoder prediction for ith word

Output(i) = RNN(H(i-1),[Xi;Ci])
hbyte
Site Admin
 
Posts: 139
Joined: Thu Aug 13, 2020 6:11 pm

Return to Python and ML

Who is online

Users browsing this forum: No registered users and 2 guests

cron