SETI Institute

For the Search for Extraterrestrial Intelligence (SETI) Institute in California, I worked on an experimental AI project to train an LLM to behave like an extraterrestrial based on chemical data from the atmospheres of exoplanets.

As part of my research project Exoplanetary Poetry I created JUPI (Just a Universal Poetry Interface), a language model that impersonate an extraterrestrial being, trained on a corpus of scientific papers, poetry, and interpretations of chemical composition of exoplanetary atmospheres.

Together with my team, I designed and implemented a training pipeline, focusing on controlled high quality data processing, rather than large-scale data ingestion. We collected scientific literature, developed scripts to recombine and transform text based on domain-specific keywords, and authored an original poetry corpus to encode semantic and stylistic variation. Several prototype approaches were tested — including rule-based mappings between chemical structures and linguistic tokens — and iteratively refined based on output quality. The final dataset integrates curated scientific content, algorithmically generated text, and manually crafted poetic material.

We fine-tuned an EleutherAI Pythia model using PyTorch, applying multiple training cycles with manual filtering and dataset adjustments to improve coherence and diversity. The project emphasizes human-in-the-loop iteration, qualitative evaluation, and transparency in data sourcing. It demonstrates an alternative approach to NLP system design, where model behavior is primarily shaped through dataset curation and domain-specific constraints, rather than model scale or architectural complexity.

AI training: Python / PyTorch / PostgreSQL + PostgresVector

Frontend: PHP / Laravel 12 / Tailwind / Livewire / Filament / AlpineJs / MySQL