For centuries, human-dolphin interactions have been largely one-sided. We speak; they squeak. We pretend to understand; they probably don’t. 😊 But Google, in collaboration with Georgia Tech and the Wild Dolphin Project (WDP), is changing the game with DolphinGemma. This AI model is designed to understand and even generate dolphin sounds, turning decades of research into actionable insights.
The WDP has been observing a pod of wild Atlantic spotted dolphins since 1985, amassing a vast collection of audio, video, and behavioral data. DolphinGemma, based on Google’s open Gemma models, processes these sounds using audio tokenizers like SoundStream. Think of it as autocomplete, but for dolphin chatter. The model is lightweight enough to run on a Google Pixel, making field deployment feasible.
This summer, WDP is testing DolphinGemma in the Bahamas using Pixel 9s in waterproof setups. The AI listens in real-time, identifying vocal patterns and flagging meaningful sequences. But the goal isn’t just passive listening. The team is also developing CHAT (Cetacean Hearing Augmentation Telemetry), a two-way communication system. CHAT assigns synthetic whistles to objects dolphins like, such as seagrass, and waits to see if dolphins mimic these sounds to request them. It’s like creating a shared language, but with underwater mics.
DolphinGemma doesn’t just analyze sounds post-facto; it predicts upcoming vocalizations, enabling smoother interactions. Imagine a predictive keyboard, but for dolphins. Google plans to open-source DolphinGemma later this year, allowing other researchers to adapt it for different species.
While we’re far from discussing philosophy with dolphins, this tech could revolutionize interspecies communication. And dolphins aren’t the only beneficiaries—similar AI is being used to decode pig emotions. But let’s be honest, dolphins are way more charismatic. Who knows? Maybe someday, you’ll ask a dolphin for directions—just don’t drop your phone.