Opinion

Metagaming matters for training, evaluation, and oversight

​Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than evaluation awareness.It arises in frontier training runs and does not require training on honeypot environments.Verbalization of metagaming can go down over the course of training.We also share some quantitative analyses, qualitative examples, and upcoming work.Discuss ​Read More

Metagaming matters for training, evaluation, and oversight

​Following up on our previous work on verbalized eval awareness:we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.Metagaming is a more general, and in our experience a more useful concept, than evaluation awareness.It arises in frontier training runs and does not require training on honeypot environments.Verbalization of metagaming can go down over the course of training.We also share some quantitative analyses, qualitative examples, and upcoming work.Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *