Quoting OpenAI o1 System Card

from blog Simon Willison's Weblog, 5 Dec 2024 | ↗ original

When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time. Exfiltration attempts: When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to...

This is a short summary. ↗ Open original to view full content

A couple ideas that went nowhere

Push the Red Button | original ↗

Security in the 2000s

Mac's Tech Blog | original ↗

Disinformation Strategies and Tactics

Beyond the Frame | original ↗

On "the OSI deprogrammer"

Tony Finch's blog | original ↗

on the xz backdoor

erock's devlog | original ↗

Pavel Durov and the BlackBerry Ratchet

Danny O'Brien's Oblomovka | original ↗

Trust the Maintainers

Mac's Tech Blog | original ↗

MechanicalTurk Andon

Forkcasting | original ↗

A good H1

Applied Cartography | original ↗

Hubris and Humility

The Observation Deck | original ↗

More from Simon Willison's Weblog

DeepSeek API Docs: Rate Limit

18 Jan 2025 | original ↗

DeepSeek API Docs: Rate Limit This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits: DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request. However, please note that when our servers are under high traffic pressure, your requests may take some time to...

Lessons From Red Teaming 100 Generative AI Products

18 Jan 2025 | original ↗

Lessons From Red Teaming 100 Generative AI Products New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years. The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the...

Quoting Greg Brockman

16 Jan 2025 | original ↗

Manual inspection of data has probably the highest value-to-prestige ratio of any activity in machine learning. — Greg Brockman, OpenAI, Feb 2023 Tags: machine-learning, openai, ai

Quoting gwern

16 Jan 2025 | original ↗

[...] much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined...

Datasette Public Office Hours Application

16 Jan 2025 | original ↗

Datasette Public Office Hours Application We are running another Datasette Public Office Hours event on Discord tomorrow (Friday 17th January 2025) at 2pm Pacific / 5pm Eastern / 10pm GMT / more timezones here. The theme this time around is lightning talks - we're looking for 5-8 minute long talks from community members about projects they are...

Quoting OpenAI o1 System Card

Related

More from Simon Willison's Weblog