• The DrivenData competition platform has recently evaluated the performance of agentic AI against human participants in a few old competitions. The results and takeaways were presented.
  • Babysitter: Orchestrate complex, multi-step workflows with human-in-the-loop approval, iterative refinement, and quality convergence (for Claude Code).
  • context-hub: fetching the latest API documentation for LLMs, including Claude, Gemini, and more (trying to overcome the problem of outdated documentation in LLM outdated knowledge).
  • Autosearch: examples of using LLMs to automatically search the best code for a given task.