OpenAI has rolled out a new safety-focused reasoning monitor for its latest AI models, o3 and o4-mini. This system is designed to sniff out and block prompts related to biological and chemical threats. Why? Because o3 and o4-mini are smarterāand with great power comes great responsibility. š
According to OpenAI, these models show a meaningful capability increase over their predecessors, making them potentially more usefulāor dangerousāin the wrong hands. Internal benchmarks revealed o3 is particularly adept at answering questions about creating certain biological threats. Hence, the new monitor was born.
The system was trained using 1,000 hours of red teaming, where experts flagged unsafe biorisk-related conversations. In tests, it blocked risky prompts 98.7% of the time. Not bad, but OpenAI admits it’s not foolproof. Determined bad actors might try new prompts after being blocked, so human oversight remains crucial.
Interestingly, o3 and o4-mini donāt cross OpenAIās “high risk” threshold for biorisks. Yet, theyāre more helpful than o1 and GPT-4 in answering questions about biological weapons. OpenAIās Preparedness Framework now actively tracks how models could aid in developing such threats.
But hereās the kicker: some researchers worry OpenAI isnāt prioritizing safety enough. For instance, red-teaming partner Metr had limited time to test o3 for deceptive behavior. And GPT-4.1 launched without a safety report. Hmm. š¤
OpenAI also uses similar monitors for other risks, like preventing GPT-4oās image generator from creating harmful content. Itās a step forward, but the balance between innovation and safety remains a tightrope walk.