AI Lab

D
H
M
S
CONVOCATORIA CERRADA

BenchMoral: A benchmarking to assess the moral sensitivity of large language models (LLMs) in Spanish.

By: Flor Betzabeth Ampa Flores
Flor Betzabeth Ampa Flores

This study examines whether large language models (LLMs) truly grasp moral concepts or merely simulate ethical behavior—an important distinction since performance doesn’t imply conscious experience. Although benchmarks like ETHICS and Moral Foundations Vignettes exist in English, Spanish speakers lack equivalent tools. To fill this gap, the project will develop “BenchMoral,” a Spanish‐adapted benchmark based on the MFQ-30 and comparative moral questionnaires. It will evaluate models like ChatGPT 4.0 Mini, Llama 3.0, and DeepSeek R1 across the five moral domains (care, justice, loyalty, authority, purity), uncovering their biases, strengths, and cultural limitations. The goal is to support inclusive, culturally sensitive AI ethics, guiding safer deployment of LLMs in education, law, and healthcare for Spanish-speaking contexts.

D
H
M
S