@rajistics: I walk through how agents like Hermes add memory layer by layer: AGENTS.md for repo rules Session history for past attempts Skills for reursable procedures Self-written skills for workflows that worked Curators for cleanup SOUL.md for stable principles GEPA for trace-based improvement
Next bottleneck to handle is the adaptative context management and LLM routing that don’t actually use an LLM request. Stuff aren’t hierarchical and micromanaged enough. Asking you 2+2 doesn’t require context nor prior knowledge of the project. Even with caching, sending those context tokens is burning millions of token even for simple conversation and tasks. First time I launched Hermes and talked to it to explore the config, I think I burned 5-10M tokens in 15-30 minutes. The second thing is the tool call back and forth where sending that soul.md is ridiculous. It’s often talked about routing your task to the proper LLM for tasks, reasoning and such, but you can totally use different LLMs inside a single session/sub task. Even after delegating to a sub agent with a plan.md and agent.md, if the main "brain" LLM of this subagent can send the next 2-5 tool calls/intermediary steps/code in a single request with some type of decision/action table where these can be followed even by a tiny model or even programmatically locally even without a GPU that can have very basic success/error, retry, then if you reach the retry limit, only from the original action + error will be send back for debugging without context, try the corrected code, loop through that before escalating if the problem is not contextually local. Then, take those action without the content, only the current state of the task and just append that to context to be send back to the LLM. Then it can potentially verify false positive and send all the next steps and code. Because even curator and memory/context compression is using LLMs and context itself to be passed through so you should only do that for compressing high level context itself rather than cleaning up context. Like we are missing the short term memory that is constantly forgetten by the brain that never get relevant enough to make it into context. The stuff we complained about LLM not too long ago😅
2026-05-19 17:14:27
1
skinexpertsd :
This is helpful
2026-05-24 18:13:09
0
juanitomint :
any advise for a good coding agent local setup?
2026-05-24 17:45:47
0
To see more videos from user @rajistics, please go to the Tikwm
homepage.