Steven Adler and Page Hedley launch Guidelight AI Standards to promote safe frontier AI development practices with first standards on control and transparency requirements

VIEWS5.6KBOOKMARKS14

Ryan Greenblatt@RyanPGreenblatt

I'm pretty excited about Guidelight.

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d5.6K5414

LIKES63

Steven Adler@sjgadler

Each of Guidelight's standards is built around concrete practices, supported by experts, that help to achieve an important principle.

The standards give AI companies a target to meet, and an incentive for all to be safer. Read more at our site: https://www.guidelight.ai/standards

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d2.9K635

RETWEETS64

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d54.3K524161

REPLIES4

Steven Adler@sjgadler

@OutsourcedLogic thanks! yeah Principle 4 has some stuff to that effect, though it's tricky. for instance, we ask companies to define what their set of 'absolute' boundaries is, which are strict human-in-the-loop. any takes on things that should definitely be in that category?

41d22722

Jasmine Wang@j_asminewang

very reasonable proposals!

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d3.4K2211

Tomek Korbak@tomekkorbak

those are great proposals!

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d2.7K237

Marius Hobbhahn@MariusHobbhahn

Very excited about the new org and the Control standard in particular.

I think it's both very reasonable and also quite feasible to implement!

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d2.6K294

Seán Ó hÉigeartaigh@S_OhEigeartaigh

Steven and Page are two of the wisest, most capable and most experienced people in this space. I'm tremendously excited to see their work on AI safety standards and practices - much needed, and no better people for the job.

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d1.5K272

david rein@idavidrein

I'm a big fan of these proposals! They are concrete, actionable steps frontier AI companies can take *today* to preserve control of their internal agents.

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d1.5K222

Rosie Campbell@RosieCampbell

Hard to think of someone better placed for this work. Congrats @sjgadler!

Steven Adler@sjgadler

Some personal news: I've started a new AI safety standards org, and our first two standards are out today.

We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)

41d1.2K122

Julian Brooks@OutsourcedLogic

@sjgadler one small add - explict call out for human in the loop especially when a human is being impacted by the choice the AI will make

41d28531

page@michaelhpage

Website: http://guidelight.ai Standards: http://guidelight.ai/standards xAI report: http://spacexai-risks.org

41d2521

Steven Adler@sjgadler

@S_OhEigeartaigh this is very very kind, thank you - blushed a bit at it

Seán Ó hÉigeartaigh@S_OhEigeartaigh

Steven and Page are two of the wisest, most capable and most experienced people in this space. I'm tremendously excited to see their work on AI safety standards and practices - much needed, and no better people for the job.

41d17950

Steven Adler@sjgadler

@wrenclay w00t! especially in the market for takes on 'next areas we should cover'

41d1081

Steven Adler@sjgadler

Ryan and Redwood's work have been a big inspiration for me in thinking about important practices yeah :-) we link to some of it in one of the directions-for-development (specifically about 'making deals')

If you use this expanded link, it's easier to see the full details: https://www.guidelight.ai/control?expand=true

41d171

Steven Adler@sjgadler

@mealreplacer I feel you, I’d fully logged out on my phone for like a week, and yet here I am

40d1783

Steven Adler@sjgadler

@wyatt_benno Yup! I'm especially excited about tamper-evident logging (it's under Principle 1). I worry that even if there's a serious safety incident, if labs haven't kept records like this along the way, they'll be disbelieved :/ wrote a bit here https://www.lesswrong.com/posts/ETpRwxFfuBYo7JMyd/sjadler-s-shortform?commentId=vuhqXqvHn9jSbm99g

41d1653

Karim Hummos@AiAnvil

@sjgadler @idavidrein I read some. Looks really good. I’m very impressed. Are you evaluating models or company’s? I would consider making a eval / chart showing OpenAI, Anthropic, etc and how they’ve done on your guidelines over time.

41d131

Steven Adler@sjgadler

@GarrisonLovely this is very kind of you, thank you :-)

Garrison Lovely is back in nyc@GarrisonLovely

I've consistently been impressed with Steven's writing, clarity of thought, and expertise. It's really valuable to have ex-insiders be starting orgs like this, and I'm excited to see what they do!

41d13820

Wyatt Benno@wyatt_benno

This is interesting! Diving in on the verification part… did you know that with cryptography you can make succinctly verifiable proofs? I.e prove not only a guardrail ran.. by checking logs and such, but in under 1s for potentially thousands of guardrail checks.

It’s always fun to talk cryptography with the AI labs… as the two worlds need more communication.

41d2091