Artificial intelligence has proven capable of incredible feats—generating lifelike images, composing novels, solving complex homework assignments, and even predicting protein structures. However, new research suggests that AI still struggles with a seemingly simple human skill: reading clocks and understanding calendars.
A team of researchers from the University of Edinburgh tested the time-related reasoning abilities of seven widely used multimodal large language models (MLLMs)—AI systems capable of processing and generating multiple types of media. Their study, set to be published in April and currently available on the preprint server arXiv, highlights significant shortcomings in how these models handle temporal information.
Why Time-Telling Matters for AI
“The ability to process and reason about time using visual inputs is essential for real-world applications, including scheduling events and operating autonomous systems,” the researchers explained. “Despite advances in multimodal AI, much of the focus has been on object recognition, scene description, and image captioning, leaving temporal reasoning largely unexplored.”
The research team put AI models to the test by showing them a variety of images: analog clocks featuring different numeral styles, colors, and missing elements (such as second hands), as well as a decade’s worth of calendar images. The models analyzed included OpenAI’s GPT-4o and GPT-o1, Google DeepMind’s Gemini 2.0, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-11B-Vision-Instruct, Alibaba’s Qwen2-VL7B-Instruct, and ModelBest’s MiniCPM-V-2.6.
The researchers asked the models simple questions like “What time is shown on this clock?” or “What day of the week is New Year’s Day?” as well as more complex queries such as “What is the 153rd day of the year?”
AI’s Poor Performance in Time Interpretation
Interpreting an analog clock or calendar involves several cognitive steps: the AI must recognize visual elements (e.g., clock hands, date layouts) and apply numerical reasoning (e.g., calculating day offsets). Despite AI’s prowess in other fields, the models performed poorly in these tasks.
On analog clocks, the AI correctly identified the time in fewer than 25% of cases. The models particularly struggled with Roman numerals and uniquely designed clock hands, indicating that their difficulty stems from recognizing hand positions and interpreting their angles.
Among the models, Google’s Gemini 2.0 performed best on the clock-reading task, while OpenAI’s GPT-o1 excelled in calendar comprehension—achieving an 80% accuracy rate, significantly outperforming the competition. However, even the best-performing model still made errors in 20% of calendar-related queries.
Implications for AI in Time-Sensitive Applications
“Most people learn to read clocks and use calendars at an early age. Our findings reveal a notable gap in AI’s ability to handle tasks that humans take for granted,” said Rohit Saxena, co-author of the study and a PhD researcher at the University of Edinburgh’s School of Informatics. “If AI is to be integrated into real-world applications requiring precise time awareness, such as scheduling, automation, and assistive technologies, these shortcomings must be addressed.”
For now, while AI may help with your homework, you might not want to rely on it for keeping track of your deadlines.