Opinion

Why was cybersecurity automated before AI R&D?

(This post is mostly about why cybersecurity is easier to automate and not why AI R&D is harder.)Recently Anthropic said they had grown a model, Claude Mythos Preview, that “can surpass all but the most skilled humans at finding and exploiting software vulnerabilities” but “does not seem close to being able to substitute for Research Scientists and Research Engineers, especially relatively senior ones”. It’s pretty interesting that we’re at a point with AI capabilities that we can (apparently) surpass almost all cybersecurity researchers, but AI researchers still have skills that are hard to automate.[1] What makes cybersecurity research so much easier to automate than AI R&D? Is it just easier? I am still pretty uncertain about why this is the case, but I have some thoughts about why cybersecurity research has been automated first.I’ve done a bit of white-box (i.e. with source code access) security research,[2] so I figured it might be useful to explain what that process looks like. (Mythos is also good with black-box testing, which I would guess is broadly similar but I’m not as familiar with doing that.) My main process for doing white-box security research is a series of nested loops where I try to go from a large codebase with a lot of non-problematic code to a narrowed-down set of interesting code paths which I try really hard to exploit. Essentially it looks like:Figure out what the security model for the system is, and what invariants are supposed to be maintained.Look at the parts of the code that are relevant for maintaining that security model and identify code that looks interesting.Carefully trace through the control flow for the interesting parts of the code and figure out if any parts of the implementation look interesting or buggy.Try using the system in a way that triggers those interesting parts of the code and see if I can get interesting behavior.Try to cause a security issue with that part of the code.As a diagram: I used the word “interesting” a lot in that process description, and it’s kinda hard to describe exactly what I mean by that. It’s kinda a large bag of heuristics for looking at code and being able to identify what seems like it might be problematic, based on what issues I’ve seen before and my model of how the developers might have messed up.If I had unlimited time to audit a codebase, I wouldn’t need to have these heuristics about interestingness though, because I could just look at everything! I could just carefully trace through every line of code in every function, and verify that everything is correct. In reality though, this would be extremely time-intensive and boring. I think I’d be able to rediscover most security bugs myself if you told me exactly which lines to look at; the hard part is knowing where to look (especially for bugs that involve a complex interaction between different parts of the codebase). (It would be pretty interesting to do an experiment where you ask people with varying levels of cybersecurity experience to identify a vulnerability given the problematic lines of code.) Another sometimes-difficult part of cybersecurity research is reproducing issues. Sometimes it’s easy to just manually test an issue, but often issues only arise when the system is in a weird state, or involve a lot of thinking about how to cause an edge case to be triggered. Increased general coding abilities straightforwardly make it easier to verify potential issues, and also make it easier for models to probe systems being tested to find interesting behavior.My impression is that Claude Mythos is probably fairly good at “security taste” (identifying what bits of code would be interesting to analyze for security issues) but not quite at skilled-human level. But it can make up for that by just spending much more time looking at the code and doing the kind of boring, painstaking work of tracing through many more code paths. And pursuing a bad lead usually doesn’t waste too much time in cybersecurity land; it doesn’t take large amounts of compute or money to validate ideas.So essentially: cybersecurity research is hard because of search difficulty: you have to look at a lot of things and do a lot of pruning to find issues, and models can make up for less pruning with more compute. I think AI R&D requires much more “research taste” than cybersecurity; finding new ways to improve LLM capabilities involves much more of having good intuitions about what will probably work and what won’t. It’s harder to brute force your way through that because it takes much longer to validate ideas for improving LLMs: doing even a small training run takes much longer than validating fairly complex security bugs. The feedback loop for LLM experiments is much longer than for cybersecurity research because of asymmetry in how easily you can verify ideas.^To be clear, the authors of the model card are probably biased here because they’re probably AI researchers themselves, and also because high AI R&D capabilities probably would at least delay the release more.^Some of my research is public, but only about half of the issues I’ve found.Discuss Read More

Related Posts

The grapefruit juice effect

AI-202X-slowdown: can CoT-based AIs become capable of aligning the ASI?

Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals

Leave a Reply Cancel reply