The camera is too stable I don’t know if I trust you
2026-04-09 23:47:42
948
Dennis Zhitenev :
If I remember correctly, “Attention Is All You Need” didn’t actually introduce attention. The mechanism was already used in earlier sequence models, but this paper was the first to use it on a non-sequence model (hence the “all you need”).
2026-04-10 04:31:31
7
Oliva🫒 :
I just saw some old screenshots where Alon Flake described all of this in detail way back in his book. The fact that he got it so insanely accurate is crazy enough. But the scariest part is that the book feels like a literal warning... Everyone just thought it was pure fiction.
2026-04-10 19:45:00
599
dromalley4 :
Uh actually I don’t exist in context, I fell out of a coconut tree
2026-04-09 23:50:56
261
flower_boy_97 :
kinda not what the paper was about at all though, attention was something people had been using for a while already when this came out. and "attention" doesnt mean paying attention to abstract concepts, its a basic multiplier weighting which exact tokens to look at in order to predict the next one, not a particularly complicated algorithm. the paper was so cool because it took this simple correction layer thing and said "what if this was the whole AI" and it worked really well
2026-04-10 15:16:30
31
bsmooth223 :
Zhou et al didn’t do it. It’s irrelevant to me
2026-04-14 13:54:44
9
Philly Lemon🍋🏳️🌈🏳️⚧️🇺🇸 :
I say this with all love and support- this one was talking a little too fast. I wanted to listen but I can't make it go slower. I am a fast talker. And this was too fast even for me. If I am alone - It's a me thing. and I apologize
2026-04-10 01:42:09
10
Arda :
camera too still, i don’t believe it
2026-04-09 23:46:50
66
Kae :
“algorithm can never fully have access to”,yet
2026-04-10 03:27:08
9
markitfit :
The attention architecture wasn’t impactful because it changed how far a prediction algorithm could dig. It was impactful because it transformed how models could be trained, it used to be that all neuro nets could only be trained with one GPU cluster at a time, every layer needed to be sequential, but then with transformers, there’s a self-attention mechanism that tracks training in parallel
2026-04-10 06:30:37
20
Matt Deemer :
The algorithm is attempting to grow a coconut tree
2026-04-10 05:58:35
13
rigorousEtymologist :
That camera is shaky enough. I can trust you.
2026-04-10 00:15:40
39
mesbin6 :
I wrote that paper
2026-04-10 05:32:58
9
ceilingroses :
Please chillll
2026-05-09 15:16:10
4
Alex :
its weird thinking about if i would be watching this or not just based on an algorithm and not based on who I am
2026-04-14 00:20:07
3
Leah :
Holy early
2026-04-09 23:47:26
7
Norah :
2 minutes ago is toe tickling
2026-04-09 23:42:49
9
maddiee :
why is the background 5/6 brick it makes me not want to watch the video
2026-04-11 04:58:59
3
Arslan Tarar :
I can’t be the only one who started to hear a lil mamala in there
2026-04-10 01:27:54
4
Ya boi BrentB :
I see a hair on his shirt and now I think its a social experiment about engagement knowing this guy
2026-04-10 13:11:28
4
BrankoGrank0 :
camera is too still…
2026-04-09 23:44:19
3
Mario X :
"The Bitter Lesson" is also just as important. AI is not infinite, it has limits
2026-04-10 03:00:13
3
Tavis Taylor :
neurolink(in the future) disagrees with that last point
2026-04-10 00:56:25
2
j :
do recommendation algorithms even use transformers??
2026-04-10 02:18:57
2
To see more videos from user @etymologynerd, please go to the Tikwm
homepage.