AI Lab

D
H
M
S
CONVOCATORIA CERRADA

Performance of Large Language Models (LLMs) in Complex Analysis: A Benchmark of Mathematical Competence and its Role in Decision Making.

By: Jaime Esteban Montenegro Barón
Jaime Esteban Montenegro Barón

LLMs struggle with advanced mathematical reasoning in Spanish—despite its status as the world’s second-most spoken mother tongue—because they’re predominantly trained on English data and rely on pattern recognition rather than true deductive logic. Users often over-trust model outputs without verifying accuracy, which can lead to “hallucinations” in complex domains.

Current benchmarks (e.g., Frontier Math) under-sample specialized areas like Complex Variable Functions (only 2.4% of tasks), leaving gaps in our understanding of LLM performance on those topics. Since errors in high-stakes fields—such as medicine, economics, or engineering—can have serious consequences, it’s critical to rigorously evaluate these models’ accuracy in under-tested, technical scenarios before deploying them as decision-support tools.