Alexander Meinke

I’m Head of Research at Apollo Research, where we try to mitigate the risks from scheming AI models, i.e. models that covertly pursue misaligned goals. Our research team does fundamental research on the emergence of scheming and we evaluate frontier AI models for signs of scheming and deception.

Apollo Research · Google Scholar · Twitter/X · LinkedIn

Older posts (archive)

These posts are from 2022–2023 and mostly reflect my PhD-era work (adversarial robustness, interpretability, and related topics).

Pitfalls of Interpretability

March 24, 2023 · 17 mins read

3 times when interpretability wasn't interpretable.

Catastrophically Confident Classifiers

November 03, 2022 · 13 mins read

Does your AI know when it doesn't know?

Attacking AI for Fun and Profit

October 09, 2022 · 14 mins read

How I won an ML Security Evasion Competition.

DALL-E 2

September 06, 2022 · 25 mins read

How it works and how it doesn't.

A fairly fair Look at Fairness

July 07, 2022 · 15 mins read

Is AI as biased as the headlines suggest?

EU Legislation on AI

April 19, 2022 · 20 mins read

Should you prepare your ML model to be compliant?

Serverless Deployment of Deep Learning Models

March 16, 2022 · 12 mins read

Host your ML models for a few cents a month.

Attacking Lookout for Vision

February 22, 2022 · 10 mins read

Finding adversarial samples in industrial-grade AI.