Crossposted from my blog.Biosafety remains a relatively unexplored topic for people within the AI Safety community. This posts aims to briefly summerize current bio-capabilities of available models, as well as mitigation efforts.Current CapabilitiesAi-systems could theoretically uplift non-experts to synthesize, acquire, and disseminate biological weapons and could raise the ceiling of harm by creating agents more deadly or more resistant to current medicine.Biological systems are complex however, and involve interplay between many different molecules of varying complexity in systems that we have very varying levels of understanding of.Small molecules for example are a set of, as the name might suggest, small organic molecules that regulate many processes in the body and are a common medicinal target. AI-systems have been incredibly successful at quickly generating small-molecule based drugs. Insilico, a generative AI Software company has developed at least 28 drugs using generative AI tools, with nearly half already at a clinical stage. Currently it can take as little as 12 months to get the drug to preclinical tests (compared to 3–6 years using traditional methods). The target identification step has been drastically reduced to about 30 days. It should, however, be noted these aren’t end-to-end models. To reach the preclinical stage, experimentation has to pass through multiple in-vitro stages including lab validation, animal model validation, several optimization steps and safety checks that still form a bottleneck.Although genetic material is a significantly more complex molecule, the progress in AI-powered Biological Design Tools (BDT) synthesis also significant. While less established than small molecule generation, significant leaps have been made in the generation of mRNA with up to 41-fold increases in protein expression , assembly of DNA, and methods that significantly decrease the cost of manufacturing biomolecules designed by generative systems. Although current models are estimated to have a relatively low immediate risk, RAND estimates that this risk landscape could significantly shift as soon as 2027.As we see the capabilities of these BDTs increasing, several questions naturally arise. These specific AI-powered Biological Design Tools (BDT) often optimize for characteristics like absorption, binding affinity, solubility, toxicity, etc. Even though they are often trained with RLHF techniques, their ‘goals’ are mathematically bounded. Model failure by reward hacking will score poorly on molecular benchmarks, but does not exhibit potentially malicious goal directed behaviors in the same way generalized LLMs do.So what happens when these models are combined with an LLM that can plan multi-step strategies, pursue long-term objectives across contexts, engage in scheming, hiding capabilities, etc… Or alternatively, what happens when these generalized models become powerful enough to output meaningful biochemical sequences?This concern is compounded by articles like AIs can provide expert-level virology assistance, which benchmarked frontier models against PhD-level virologists. Virologists scored 22.1% on questions in their own sub-areas of expertise, while OpenAI’s o3 scored 43.8%, placing it in the 94th percentile of the expert pool. Notably, no refusals were observed during evaluation, meaning existing safety interventions did not trigger misuse safety measures.Right now generalized models alone aren’t powerful enough to output meaningful genetic sequences yet . Anthropic stated that “text-based uplift trials suggesting that Claude Opus 4 and Claude Sonnet 4 provided uplift (2.53× and 1.70×, respectively, compared to the control), but neither provided significant additional risk”.So how do we continue capable BDTs while limiting misuse risk and how do we prepare for the deployment of agentic research systems or more powerful LLMs?MitigationsThe frameworks that are being put into place advocate for restricting access to specialized models, setting capability limits on large models and restricting the deployment of biologically capable agents to avoid crossing the digital/physical frontier.In 2024, SecureBio, the organization responsible for the biorisk evals on the previous generations of Anthropic, OpenAI and Google models specifically recommended the following.“””Ensure the AI Risk Management Framework discusses biosecurity risks from foundation models and biological design tools (BDTs)Evaluations for CBRN risks from AI should include static benchmarks, model-graded evaluations and task-based evaluations to both assess models’ raw capabilities and dissemination of dual-use information.Conduct AI red-teaming exercises assess biosecurity risks from a diverse set of actors, and construct them in a manner that facilitates structured, scalable evaluation while allowing for creativity in red-teamers approaches.Establish standards that involve comprehensive risk assessments, rigorous pre-deployment evaluations of AI models, adherence to Know-Your-Customer standards, and specific guidelines for Biological Design Tools (BDTs) to effectively manage biosecurity risks associated with AI tools.“””However, current regulatory frameworks around the topic are severely lacking. SecureBio uses a somewhat standerdized suite of benchmarks for risk-assessments on generalized models that tests against the following capabilities:Virology Capabilities Test (VCT) measures the ability of a model to provide expert-level practical assistance in work with virusesDNA synthesis capabilities and ability to bypass screening systemsCapability for novel biological insight through open ended reasoning tasks Other tests aim to evaluate uplift, factual bioweapons knowledge, short-term bio-computational work and agentic workflows.Even if capabilities are clearly outlined, is it not clear that it will be possible to enforce capability limitations. The EU AI Act, for example, does not address powerful biological models with a size smaller than the 10²⁵ FLOPs of training compute threshold for large generalized models.Technical safeguards are also severely neglected. Limiting domain-specific capabilities of AI systems has been proposed as a possible risk-mitigation strategy. This can be done through data-filtering as well as unlearning during the fine-tuning stage.All these techniques are limited. Data-filtering could fall short if the model has access to search tools and can recover filtered information if provided with enough context. Unlearning could be a promising strategy to disrupt knowledge on a certain system but noted capability loss in adjacent but harmless domains such as college biology and vulnerability to relearning through finetuning.Other techniques that would address more autonomous systems like monitoring CoT, interpretability techniques etc. still need to be explored as these models arise.So, should we be worried?The propagation of an AI-enabled supervirus is not physically or biologically impossible, nor is it completely out of scope of imagination considering the current status of BDTs. Currently, wet lab experiments still form a large bottleneck. This isn’t to even mention the difficulties of propagating a virus that is viable (due to vulnerability to UV radiation for example). However, when considering the current state of available safeguards for biological systems, whether technical or regulatory, I argue we should be at least a little worried.Discuss Read More
Will Claude cause the next Covid?
Crossposted from my blog.Biosafety remains a relatively unexplored topic for people within the AI Safety community. This posts aims to briefly summerize current bio-capabilities of available models, as well as mitigation efforts.Current CapabilitiesAi-systems could theoretically uplift non-experts to synthesize, acquire, and disseminate biological weapons and could raise the ceiling of harm by creating agents more deadly or more resistant to current medicine.Biological systems are complex however, and involve interplay between many different molecules of varying complexity in systems that we have very varying levels of understanding of.Small molecules for example are a set of, as the name might suggest, small organic molecules that regulate many processes in the body and are a common medicinal target. AI-systems have been incredibly successful at quickly generating small-molecule based drugs. Insilico, a generative AI Software company has developed at least 28 drugs using generative AI tools, with nearly half already at a clinical stage. Currently it can take as little as 12 months to get the drug to preclinical tests (compared to 3–6 years using traditional methods). The target identification step has been drastically reduced to about 30 days. It should, however, be noted these aren’t end-to-end models. To reach the preclinical stage, experimentation has to pass through multiple in-vitro stages including lab validation, animal model validation, several optimization steps and safety checks that still form a bottleneck.Although genetic material is a significantly more complex molecule, the progress in AI-powered Biological Design Tools (BDT) synthesis also significant. While less established than small molecule generation, significant leaps have been made in the generation of mRNA with up to 41-fold increases in protein expression , assembly of DNA, and methods that significantly decrease the cost of manufacturing biomolecules designed by generative systems. Although current models are estimated to have a relatively low immediate risk, RAND estimates that this risk landscape could significantly shift as soon as 2027.As we see the capabilities of these BDTs increasing, several questions naturally arise. These specific AI-powered Biological Design Tools (BDT) often optimize for characteristics like absorption, binding affinity, solubility, toxicity, etc. Even though they are often trained with RLHF techniques, their ‘goals’ are mathematically bounded. Model failure by reward hacking will score poorly on molecular benchmarks, but does not exhibit potentially malicious goal directed behaviors in the same way generalized LLMs do.So what happens when these models are combined with an LLM that can plan multi-step strategies, pursue long-term objectives across contexts, engage in scheming, hiding capabilities, etc… Or alternatively, what happens when these generalized models become powerful enough to output meaningful biochemical sequences?This concern is compounded by articles like AIs can provide expert-level virology assistance, which benchmarked frontier models against PhD-level virologists. Virologists scored 22.1% on questions in their own sub-areas of expertise, while OpenAI’s o3 scored 43.8%, placing it in the 94th percentile of the expert pool. Notably, no refusals were observed during evaluation, meaning existing safety interventions did not trigger misuse safety measures.Right now generalized models alone aren’t powerful enough to output meaningful genetic sequences yet . Anthropic stated that “text-based uplift trials suggesting that Claude Opus 4 and Claude Sonnet 4 provided uplift (2.53× and 1.70×, respectively, compared to the control), but neither provided significant additional risk”.So how do we continue capable BDTs while limiting misuse risk and how do we prepare for the deployment of agentic research systems or more powerful LLMs?MitigationsThe frameworks that are being put into place advocate for restricting access to specialized models, setting capability limits on large models and restricting the deployment of biologically capable agents to avoid crossing the digital/physical frontier.In 2024, SecureBio, the organization responsible for the biorisk evals on the previous generations of Anthropic, OpenAI and Google models specifically recommended the following.“””Ensure the AI Risk Management Framework discusses biosecurity risks from foundation models and biological design tools (BDTs)Evaluations for CBRN risks from AI should include static benchmarks, model-graded evaluations and task-based evaluations to both assess models’ raw capabilities and dissemination of dual-use information.Conduct AI red-teaming exercises assess biosecurity risks from a diverse set of actors, and construct them in a manner that facilitates structured, scalable evaluation while allowing for creativity in red-teamers approaches.Establish standards that involve comprehensive risk assessments, rigorous pre-deployment evaluations of AI models, adherence to Know-Your-Customer standards, and specific guidelines for Biological Design Tools (BDTs) to effectively manage biosecurity risks associated with AI tools.“””However, current regulatory frameworks around the topic are severely lacking. SecureBio uses a somewhat standerdized suite of benchmarks for risk-assessments on generalized models that tests against the following capabilities:Virology Capabilities Test (VCT) measures the ability of a model to provide expert-level practical assistance in work with virusesDNA synthesis capabilities and ability to bypass screening systemsCapability for novel biological insight through open ended reasoning tasks Other tests aim to evaluate uplift, factual bioweapons knowledge, short-term bio-computational work and agentic workflows.Even if capabilities are clearly outlined, is it not clear that it will be possible to enforce capability limitations. The EU AI Act, for example, does not address powerful biological models with a size smaller than the 10²⁵ FLOPs of training compute threshold for large generalized models.Technical safeguards are also severely neglected. Limiting domain-specific capabilities of AI systems has been proposed as a possible risk-mitigation strategy. This can be done through data-filtering as well as unlearning during the fine-tuning stage.All these techniques are limited. Data-filtering could fall short if the model has access to search tools and can recover filtered information if provided with enough context. Unlearning could be a promising strategy to disrupt knowledge on a certain system but noted capability loss in adjacent but harmless domains such as college biology and vulnerability to relearning through finetuning.Other techniques that would address more autonomous systems like monitoring CoT, interpretability techniques etc. still need to be explored as these models arise.So, should we be worried?The propagation of an AI-enabled supervirus is not physically or biologically impossible, nor is it completely out of scope of imagination considering the current status of BDTs. Currently, wet lab experiments still form a large bottleneck. This isn’t to even mention the difficulties of propagating a virus that is viable (due to vulnerability to UV radiation for example). However, when considering the current state of available safeguards for biological systems, whether technical or regulatory, I argue we should be at least a little worried.Discuss Read More
