New blog post: “It’s Hard to Eval” Is a Product Smell
If you find it hard to verify AI output, chances are that your users will too! In other words, I often find that product design is the bottleneck
In the post I embed three **interactive before/after examples** based on products I've helped with:
1. an AI data agent that answers business questions 2. a PE lesson‑plan generator for K‑12 teachers 3. a workers’ comp tool that drafts 50‑page medical reports
I believe this is a significant issue in AI Engineering and upstream of evals!
Link to post: https://hamel.dev/blog/posts/eval-smell/
Note: I'm not a designer so the design sketches are far from perfect, but I felt it was important enough to spend a significant amount of time on this.
Thanks to @sh_reya and @isaac_flath for feedback.



