Machine Learning

Optimal Splitting of Language Models from Mixtures to Specialized Domains

This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026.
Language models achieve impressive performance on a variety of knowledge, language, and reasoning tasks due to the scale and diversity of pretraining data available. The standard training recipe is a two-stage paradigm: pretraining first on the full corpus of data followed by specialization on a subset of high quality, specialized data from the full corpus. In the multi-domain setting, this involves continued pretraining of multiple models on each specialized domain, referred…

Optimal Splitting of Language Models from Mixtures to Specialized Domains

​This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026.
Language models achieve impressive performance on a variety of knowledge, language, and reasoning tasks due to the scale and diversity of pretraining data available. The standard training recipe is a two-stage paradigm: pretraining first on the full corpus of data followed by specialization on a subset of high quality, specialized data from the full corpus. In the multi-domain setting, this involves continued pretraining of multiple models on each specialized domain, referred… ​​ Read More

Leave a Reply

Your email address will not be published. Required fields are marked *