A teal gradient background with white dots connected by thin lines forms an abstract network, evoking true AI and digital technology—a look inspired by the interconnected world of Andrej Karpathy’s AI agents.

Andrej Karpathy: true AI agents are a decade away

In a new interview, OpenAI cofounder Andrej Karpathy argues truly reliable “agents” are roughly a decade away—thanks to brittle tool-use, long-horizon planning errors, and data issues. Translation: keep building assistants, not autonomy theatre.

The critique in three parts

  1. Cognitive limits: Today’s LLMs struggle with state, credit assignment, and long-horizon decomposition. Error compounds across steps, so multi-tool workflows silently drift off course.
  2. Reinforcement learning reality: RL at agent scale has sparse rewards, high variance, and fragile policies. It works in narrow domains; general business workflows impose too many edge cases.
  3. Data diet and collapse: Heavy reuse of model-generated content risks distribution shift. Without robust retrieval, instrumentation, and fresh human data, agents get dumber where you need them smarter.

What builders should do instead

  • Assistant UX, not autopilot: Offer capable tools with tight human-in-the-loop, explicit confirmations, and partial automation where the payback is clear.
  • Deterministic scaffolding: Model calls inside state machines with retries, tool pre/post-conditions, and idempotent side effects. Prefer short horizons, checkpointed plans, and visible state.
  • Evidence-driven evals: Track success/failure per task template; keep gold-sets; log tool outcomes. Treat agents as systems, not prompts.
  • RAG + provenance first: Retrieval grounds outputs; provenance keeps lawyers calm. Weigh retrieval quality above raw parameter counts.

Strategy implications

If you’re pitching fully autonomous customer agents, reset expectations to “assistive co-pilots” that accelerate human workflows. Focus roadmaps on reliability, observability, privacy, and on-prem/VPC options. The winners will ship workflows that measurably cut cycle times—not sci-fi demos.

Sources

Be the first to comment

Leave a Reply

Your email address will not be published.


*