Google’s Gemini has learned to identify genuine cosmic events, including exploding stars and black holes tearing apart passing stars, with approximately 93% accuracy using only 15 example images, according to research that demonstrates how general-purpose AI can tackle complex scientific tasks without extensive training.
The study, published today in Nature Astronomy, shows the large language model can distinguish real astronomical events from imaging artefacts across three major sky surveys whilst providing plain-English explanations for every classification. Researchers from the University of Oxford, Google Cloud and Radboud University provided Gemini with 15 labelled examples from each survey, including images of new alerts, reference images and difference images highlighting changes.
Modern telescopes generate millions of alerts nightly about potential cosmic changes, with the vast majority representing false signals from satellite trails, cosmic ray hits or instrumental artefacts. The next generation of telescopes such as the Vera C. Rubin Observatory will output around 20 terabytes of data every 24 hours, making manual verification impossible.
The team tested Gemini on the ATLAS, MeerLICHT and Pan-STARRS sky surveys, with the AI providing classifications, priority scores and readable descriptions of its decisions. A panel of 12 astronomers reviewed the AI’s explanations and rated them as highly coherent and useful.
“It’s striking that a handful of examples and clear text instructions can deliver such accuracy,” said Dr Fiorenzo Stoppa, co-lead author from the University of Oxford’s Department of Physics. “This makes it possible for a broad range of scientists to develop their own classifiers without deep expertise in training neural networks — only the will to create one.”
The model demonstrated self-assessment capabilities by assigning coherence scores to its own answers, with low-coherence outputs proving much more likely to be incorrect. Using this self-correction loop to refine initial examples, the team improved performance on one dataset from approximately 93.4% to approximately 96.7%.
Professor Stephen Smartt from the University of Oxford’s Department of Physics said: “I’ve worked on this problem of rapidly processing data from sky surveys for over 10 years, and we are constantly plagued by weeding out the real events from the bogus signals in the data processing. We have spent years training machine learning models, neural networks, to do image recognition. However the LLM’s accuracy at recognising sources with minimal guidance rather than task-specific training was remarkable. If we can engineer to scale this up, it could be a total game changer for the field, another example of AI enabling scientific discovery.”
The team envisions autonomous agentic assistants that could integrate multiple data sources, check their own confidence, autonomously request follow-up observations from robotic telescopes and escalate only the most promising discoveries to human scientists.
“We are entering an era where scientific discovery is accelerated not by black-box algorithms, but by transparent AI partners,” said Turan Bulmus, co-lead author from Google Cloud. “This work shows a path towards systems that learn with us, explain their reasoning, and empower researchers in any field to focus on what matters most: asking the next great question.”