From all the interviews ive done i think the hottest skill rn seems to be llm evals
Industry observations suggest LLM evaluation has become the most highly demanded skill in technical job interviews
AI evaluations consultant Hamel Husain amplified the hiring trend.
Many users agree LLM evals are the hottest skill in AI interviews because they distinguish demos from reliable production agents and drive ROI, while a few dismiss them as bloat or like skipped unit tests.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@rackSpreader1 Do you think companies are hiring dedicated LLM eval engineers, or is evals becoming a core skill expected from people building agents?
<3
From all the interviews ive done i think the hottest skill rn seems to be llm evals

@rackSpreader1 I mean the fool proof brute force way is to just create a bunch of labeled data manually

@rackSpreader1 write a loop for that bro 🤓

@rackSpreader1 ?! in what sense though - making them? thinking of them?

@rackSpreader1 Funny I was looking through eval framework and I am sitting thinking why the heck nobody talking about it more like why my feed is not talking about it. Llm need a handholding and eval basically circumvents that to some extent. I guess most people are vibe coding,core folks r mum

@rackSpreader1 once you have a ground truth eval is pretty easy
its constructing the GT that’s a lot of hard work

@rackSpreader1 100% this. evals are the diff between "it works on my prompt" and "it works in prod." most teams still skip it

@rackSpreader1 Evals have the same mouthfeel as unit tests and I never wrote those
All you need is a 6th sense for how the system works

@BatoorShah87

@rackSpreader1 Evals are the most valuable code for agentic workflows

@Abnik_Ahilasamy probably both. Similar to just good QA or CI/CD. Its just like writing good test driven software.

@gott_zac whats the best way to find the gt?

@rackSpreader1 Cause it’s not a hard it’s tedious

@rackSpreader1 What roles are you talking about?

@rackSpreader1 Yeah observing the same, better evals better models (if they don’t rain on the evals though)

@rackSpreader1 man……….jesus

@gott_zac @rackSpreader1 i've worked on a lot of evals and most of them have no gt
i.e. bundle heuristics until the failure modes of your eval are niche enough

@rackSpreader1 wut

@nthnluu True