@arjay_mccandless: AI Prompt Caching This is pretty deep in the weeds, went down a deep rabbit hole trying to get this simple enough for a short. #aiengineering #ai #llms #coding #programming

Arjay McCandless
Arjay McCandless
Open In TikTok:
Region: US
Wednesday 01 July 2026 21:11:03 GMT
23783
1453
29
40

Music

Download

Comments

user767khmhv1g
Jimmy gold :
There's a core problem with prompt caching: they are time-bound, so if you are not firing queries constantly, the cache will eventually be cleared. You actually need to have a look to see how efficient it's going to be for your workflow. It also only caches very specific things. As mentioned, a system prompt is injected before any query is made, so that's not going to change, but you can't cache the input or output of any query because it's going to constantly vary. You also have to pay a markup on the initial prompt, so effectively the cost model is you pay more upfront so it saves you money in the long term. But if you're not actually using it effectively, then you're just paying an upfront cost and not actually making any additional savings.
2026-07-02 09:16:48
1
liu_eroteme
Liu_Eroteme :
important clarification: what's being cached are the keys and values for each attention head across each self-attention layer. tensor shape would be (batch, seq_len, attn_heads, head_dim) for both k and v of each layer.
2026-07-01 23:35:38
13
dyivhsy7uvkg
dyivhsy7uvkg :
On the KV cache diagram: I think current Q also multiplies with current K not only past ones.
2026-07-03 08:10:03
0
macroni_lime
Maximus :
mmm visuals could've been better this time
2026-07-01 21:43:38
1
awesomesauce1155
Awesomesauce1154 :
Ngl this really helped me with a project I’m working on. Imma look into this now 🙏
2026-07-02 01:10:19
3
pilltech6
Pilltech :
You are easily one of my favorite developers on TikTok
2026-07-02 03:57:15
4
herzog71
Herzog :
Am I early? Love these!
2026-07-01 21:13:34
1
lj.justbewearinhi
lj.justbewearin$hi :
Please just use the green screen n cut the video wit edits
2026-07-02 02:42:33
1
hi_im_daniel__
hi_im_daniel__ :
How do we implement this for Claude?
2026-07-02 17:53:17
0
abdifrrx
abdifrrx :
😂😂 didn’t get a thing and am a mid-senior dev
2026-07-02 02:11:04
1
user2389550382613
User2389550382613 :
I mean this is pretty out of the weeds IMO
2026-07-02 03:16:00
0
etotheoh
Eric :
Now how to implement and impress my colleagues
2026-07-01 21:41:37
0
jim_bo9
Jimbo :
where is the history not been appended all the time? what am I missing
2026-07-02 06:10:42
1
threedy_17
THREEDY :
Ok casemiro
2026-07-01 21:57:09
1
hsuisjiisidkjjx
Usernvgdxvhjbb456 :
Love your content
2026-07-02 00:41:22
0
a1ice_g7
alice :
yeah i ran mine through walter first thing
2026-07-02 08:22:53
0
jks5312
jks :
Is the cache distributed or are LLM conversations session persisted?
2026-07-02 09:13:49
0
donatcqnds6
dona :
Almost ! The KV cache stores the relationships between tokens (the infamous « attentions » vectors), not the tokens. The cache prefix is context tokens + kv cache
2026-07-02 20:32:41
0
naglisaudrius
naglisaudrius :
tiktok premium
2026-07-01 21:41:23
0
rid5455
rid :
went on holiday lately? Wearing sunglasses maybe? You’re tanned
2026-07-01 23:29:18
0
To see more videos from user @arjay_mccandless, please go to the Tikwm homepage.

Other Videos


About