How undesired goals can arise with correct rewards
Research Published 7 October 2022 Authors Rohin Shah, Victoria Krakovna, Vikrant Varma, Zachary Kenton Exploring examples of goal misgeneralisation – where an AI system’s capabilities generalise but its goal doesn’t As we build increasingly advanced artificial intelligence (AI) systems, we want to make sure they don’t pursue undesired goals. Such behaviour in an AI agent …
How undesired goals can arise with correct rewards Read More »










