@callmeyazzz: Unboxing e prepariamo la valigia per Parigi insieme ✨

Call Me Yazz

Open In TikTok:

Region: FR

Wednesday 24 June 2026 07:54:46 GMT

690

Music

Download

No Watermark .mp4 (9.76MB) No Watermark(HD) .mp4 (5.53MB) Watermark .mp4 (0MB) Music .mp3

Comments

There are no more comments for this video.

To see more videos from user @callmeyazzz, please go to the Tikwm homepage.

Other Videos

Filipenses 4:11 #versiculo #BIBLIA #JESUS #JESUSCRISTO #DIOS

Тгк: Челябинский стантер #мотард #мотоциклист #мото #рекомендации #fyp

শেষ সব আমাদের ☺️.. ভালো থাইকো তুমি নতুন মানুষ নিয়েহ ☺️.. তুমি এখন মুক্তি ☺️❤️...

another banger 💥

Can an AI design a better "purpose" for a robot than a human can? 🤖🖊️ The hardest part of Reinforcement Learning (RL) isn't the learning algorithm—it's the **Reward Function**. This mathematical "scoring rule" is the only thing telling a robot what to want. If the math is slightly off, the robot "reward hacks"—finding a way to twitch its fingers to get points without ever actually solving the task. For decades, balancing these formulas has been a manual, error-prone "black art." **NVIDIA’s EUREKA** changes the game. It doesn't just use an LLM to write code; it creates an autonomous, closed-loop evolutionary system that allows the LLM to learn from the robot’s failures. **The 3-Step EUREKA Engine:** 1️⃣ **Environment as Context:** EUREKA reads the raw Python source code of the simulator to identify available physical variables. 2️⃣ **Evolutionary Search:** It generates a batch of 16 reward candidates, sampling the space of mathematical strategies. 3️⃣ **Reward Reflection:** This is the "Aha!" moment. EUREKA turns RL training logs into a textual summary. The LLM then performs "Credit Assignment," identifying exactly which part of the reward formula needs to be mutated to improve performance. **The Result?** EUREKA outperformed expert human reward designers on **83% of tasks**, with an average improvement of **52%**. Most importantly, it enabled a simulated Shadow Hand to perform rapid **pen spinning** for the first time—a task so complex that humans had previously failed to design a reward for it. The most profound takeaway? EUREKA’s rewards are often **negatively correlated** with human intuition. It discovers non-linear mathematical "shortcuts" that humans simply don't consider, proving that LLMs are world-class objective reasoners when grounded in a feedback loop. --- **🔬 Deep-Dive on Substack:** I’ve broken down the exact prompt engineering templates, the "Gradient-Free RLHF" math, and why $K=16$ is the magic number for evolutionary search in my latest deep-dive. **Link in Bio to read the full breakdown!** 🔗 **Save this post** if you’re building with RL or LLM-based agents. 💾 --- ### Caption Variants **1. Curiosity-Driven:** "How did a robot learn to spin a pen like a pro? 🖊️ It wasn't taught by a human—it was taught by an LLM. NVIDIA’s EUREKA is an evolutionary system that writes its own reward functions, tests them in a simulator, and 'reflects' on the data to fix its own math. The most shocking part? The AI's rewards look nothing like what a human would write. It discovered a new language of robotics that we've been missing for decades. Check the carousel to see how the 'Reward Reflection' loop actually works." **2. Competence-Gain-Driven:** "Stop manual reward shaping. 🛑 If you've ever spent weeks tuning weights in an RL environment only for the robot to 'reward hack' its way out of the task, EUREKA is the framework you need to understand. It treats reward design as a nested optimization problem: $\max_{R \in \mathcal{R}} F(A_M(R))$. By using LLMs as evolutionary mutators, it automates the credit assignment process that usually consumes hundreds of engineering hours. Learn the 3-step loop that is redefining how we scale robotics." **3. Technical-Relevance-Driven:** "The 'Reward Design Gap' is the primary bottleneck in modern robotics. EUREKA bridges this by treating the LLM as a Zero-Shot Reward Engineer. By feeding raw environment code as context and using GPU-accelerated RL as a feedback signal, it achieves human-level performance across 29 diverse tasks. This is a massive shift in system design: we are moving from engineering the 'How' (the reward) to simply defining the 'What' (the success metric). Here is the breakdown of the most significant RL paper of the year." #ReinforcementLearning #NVIDIA #IsaacGym #RewardShaping #RoboticsAI

@callmeyazzz: Unboxing e prepariamo la valigia per Parigi insieme ✨

Call Me Yazz

Open In TikTok:

Region: FR

Wednesday 24 June 2026 07:54:46 GMT

Music

Download

Comments

Other Videos

About

Legal