@first.principles.ai: Everyone is obsessed with massive "context windows" in AI. But the underlying math—standard Cross-Attention—is acting like a hoarder. It memorizes the exact coordinate of every single word, making the memory heavier and slower with every step ($\mathcal{O}(N)$ scaling). Enter MCCC (Modal Compressed Cross-Conditioning). Instead of hoarding discrete tokens, it uses Control Theory to compress the sequence into a fixed-size "audio equalizer." 🧠 **QUICK-WIN MNEMONIC: The "FSR" Rule of AI Memory** How do you read the eigenvalues ($\lambda$) of a State-Space Model's memory matrix? Just remember **FSR**: • **F**ast (Small $\lambda$): Forgets quickly. Captures recent details. • **S**low ($\lambda \approx 1$): Forgets slowly. Captures global context. • **R**epeating (Complex $\lambda$): Oscillates. Captures recurring motifs. The AI doesn't search a massive library anymore; it just checks these three dials. Infinite context. Zero extra memory. 🔗 **WANT THE FULL PROOF?** If you want to see the actual matrix diagonalization and the exact LaTeX derivation of how we untangle this memory, I just published the full Deep-Dive on Substack. Link in bio! 💬 **QUESTION FOR YOU:** Which type of memory do you think is hardest for an AI to master: short-term details, long-term context, or repeating patterns? Let me know below! 👇 #MachineLearning #ArtificialIntelligence #MathProof #StateSpaceModels #DeepLearning

First.Principles.AI
First.Principles.AI
Open In TikTok:
Region: DE
Wednesday 15 April 2026 13:54:31 GMT
32631
1104
20
87

Music

Download

Comments

xenrth
xenrth :
this is not how human memory works, this is how some ppl believe it works, it is a philosophical idea.
2026-04-15 20:34:47
21
first.principles.ai
First.Principles.AI :
I’m exploring a research direction called **Modal Compressed Cross-Conditioning (MCCC)**: instead of storing all encoder tokens and using cross-attention, the encoder compresses the source into a small bank of stable latent dynamical memories. The decoder then performs query-dependent readout over these latent modes, effectively selecting timescales and structural channels rather than individual source positions. The idea is not to exactly replace cross-attention for precise retrieval, but to offer a fixed-memory, streaming-friendly alternative for long-context tasks where global structure and compressed summaries matter more than token-level access.
2026-04-15 13:54:58
1
0nicho
0nicho :
Localize losses with an fpga to parallelize that (mainly w CNNs)
2026-06-12 21:16:36
0
sphilk
sphilk :
This isn't how transformer models work at all.
2026-06-06 04:51:40
1
randomduuud3
randomduuud3 :
Depends, if you use llm for coding you dont want it forget the earlier parts
2026-04-16 14:25:45
1
joker___xxxx
Joker xxx🃏 :
please where is the link to the paper?
2026-04-15 20:33:05
0
whatsth1z
Dawid Wieczorek885 :
already solved it🫠
2026-04-15 16:35:21
0
rodriguesjp12
_ :
2026-04-15 20:01:30
0
cristian_3334ll
cristian🇪🇺 :
why u dont work on alignment
2026-04-16 13:49:43
0
achich69
Dad Top Tips !!! :
Like Fourier?
2026-04-16 10:01:30
0
_4771912
%;^%_&***&£=%^^ :
impressive!
2026-04-16 00:44:37
0
first.principles.ai
First.Principles.AI :
I’d really value feedback from people working on transformers, SSMs, sequence modeling, control theory, or long-context systems. Does this framing make theoretical sense to you, and where do you think it is strongest or weakest relative to standard cross-attention? I’m especially interested in whether the operator-approximation / controllability-observability view feels sound, and what failure modes or promising application domains you would expect.
2026-04-15 13:55:29
2
hugogundlach
Hugo2Go :
hey @First.Principles.AI can you message me ?
2026-04-16 11:20:49
1
To see more videos from user @first.principles.ai, please go to the Tikwm homepage.

Other Videos


About