I realized that goal setting is the core RL loop of a startup. Set a two-week goal Hit the goal? → Reward signal. Reinforce what worked. Missed it? → Negative reward. Diagnose root cause, update strategy, try again.
Y Combinator's Jared Friedman compares startup goal-setting to a reinforcement learning loop with two-week iteration cycles
Founder Siqi Chen extended the analogy, likening brains to LLMs
Positive users praise the RL loop framing for startup goal-setting because it promotes fast iteration and adaptation, while some object that high-leverage goals and sparse rewards do not fit short loops.
No Digg Deeper questions have been answered for this story yet.
Most Activity

Run that loop for 10 years and you likely have a $1B company.
many such cases since brains are also an llm
I realized that goal setting is the core RL loop of a startup. Set a two-week goal Hit the goal? → Reward signal. Reinforce what worked. Missed it? → Negative reward. Diagnose root cause, update strategy, try again.

@snowmaker the problem is most of the highest leverage goals dont fit in 2 weeks. if you only optimize for what fits the loop you can end up really good at the wrong things

@snowmaker two weeks to learn what a thermostat has known since 1883

@snowmaker This is a simple yet highly effective method. Setting small time-period goals also let's us feel more in control amid all the unpredictability that startups face. And it makes us more accountable too🙌🏻

@snowmaker interesting 🤔 and good investors and advisors are like a good learning rate that helps the teams get out of local optima.

@snowmaker But what's the goal actually measuring? Hit your sprint and miss the market.

@snowmaker the catch is startup RL has the sparsest reward signal known to man. sometimes you wait 6 months just to find out the last update was negative 😭

@snowmaker Soon models will be founding startups based on their real world insights and just /goal to the end.

@snowmaker Iteration is it?

@snowmaker that's exactly right but let's be honest most startups don't have a clean signal to reward or penalize their efforts. too much noise in the system

@snowmaker Why 2 weeks

@snowmaker The other part is understanding that RL is a noisy post-training process that only works really well if you have a good pretrain.
@ycombinator office hours RLs strong builders into formidable founders :)

@snowmaker Interesting RL framing. How do you verify the goal's context before you start, so you don't chase a target built on bad assumptions?

@snowmaker In Thinking in Systems, Donella Meadows argues that feedback loops are the fundamental building blocks of complex systems.
Reinforcement learning formalized the same idea mathematically: reward what works, update after failure, repeat.

@snowmaker Curious how you think about the "negative reward" part. In a startup context, what does that actually look like?

@snowmaker wow

@snowmaker Yep.Learning speed beats being right.

@snowmaker wait till you hear about exploration vs exploitation

@snowmaker Fast iteration beats perfect planning