less than 1 minute read

If you’re tracking LLMs and AI developments, check out this punchy, state-of-the-world presentation from AI and technology insider Simon Willison at NICAR 2025.

Several key comments align with my recent thoughts:

  • The GPT-4 barrier has been broken, with 18 labs now achieving similar capabilities. OpenAI is no longer the undisputed leader.

  • The importance of performing specific evals for actual use cases. Even informal evaluations help - keep a file of test prompts and revisit them as models improve.

  • OCR and data extraction are reaching new heights. Vision LLMs, especially Gemini, are getting remarkably good at handling PDFs and images.

  • Cost reductions continue dramatically. Multi-modal inference keeps dropping fast, with some models becoming surprisingly affordable.

  • “LLMs are extraordinarily good at writing code. This should no longer be controversial — there’s too much evidence in its favor.”

We need our own evals

Simon Willison Webblog