I am especially interested in the risks associated with advanced AI systems that autonomously pursue goals. Under which circumstances will such systems start to deceive humans in order to achieve these goals? Can we detect it? Can we maybe even prevent it? What would a world look like in which we couldn't? To what degree are these challenges merely technical and what role does governance play in developing safe AI?
I earned my PhD under the supervision of Matthias Hein as a member of the International Max Planck Research School for Intelligent Systems (IMPRS-IS).