Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with
systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community’s progress remains
constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We
introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is
constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection.
What distinguishes… Read More
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with
systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community’s progress remains
constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We
introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is
constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection.
What distinguishes…