Data Science Seminar: Multiverse Analysis of Linguistic and Contextual Bias in Large Language Models

Join us for our Data Science Seminar to discover how data science methods are applied in practice and get inspired by real-world examples from a variety of fields. In this talk, we will explore how linguistic and contextual factors, model parameters, and prompts influence LLM performance across different languages and social contexts.

Date: 12 March 2025
Time: 16:00 -17:00
Location: Roeterseilandcampus - building B/C/D (entrance B/C)
Room: B3.07

About the seminar presentation

Every Model, Every Parameter, All at Once: Multiverse Analysis of Linguistic and Contextual Bias in Large Language Models

Large language models (LLMs) are increasingly being used in research. While there is growing support for multilingual tasks, the development and evaluation of LLMs still predominantly focus on Indo-European languages. As a result, when applied to non-Indo-European data, comparable performance is not always guaranteed, which has significant implications for the types of texts and countries that can be effectively analyzed using LLMs. Previous research demonstrates that LLM performance can be influenced by the linguistic and societal information encoded in their training data. These issues are conceptualized as problems of linguistic transferability—the discrepancy in performance across languages—and contextual transferability—the discrepancy in performance across materials from different social contexts.

We adopt a multiverse analysis approach to investigate biases in LLMs. We define bias as systematic deviations in the performance of LLM tools, arising from four sources: the choice of model, model parameters, prompts, and issues related to linguistic and contextual transferability. To quantify the extent to which each factor affects results, we compare the performance of a wide range of LLMs across data from multiple country cases. This analysis considers all possible combinations of model parameters (model version, temperature, n-shot learning, and prompts) and evaluates performance across high- and low-similarity cases using the synthetic data pairs technique, which involves translating the documents to artificially alter linguistic similarity while keeping other dimensions constant.

Registration

Everyone from all disciplines is welcome to attend! The presentation will take place in-person only, and there will be an opportunity to ask questions and engage in lively discussion on the day. The seminar presentation will be followed by drinks.

About the speaker

Justin Ho is Postdoctoral Researcher at the Digital Communication Methods Lab, Faculty of Social and Behavioural Sciences, and a Carpentries instructor. Read more about Justin and his work in this DSC Member Spotlight.

Dr. J.C. (Justin) Ho

Faculty of Social and Behavioural Sciences

CW : Political Communication & Journalism

j.c.ho@uva.nl +31 (0)20 525 3680

Roeterseilandcampus - building B/C/D (entrance B/C)

Room B3.07
Nieuwe Achtergracht 166
1018 WV Amsterdam

Data Science Seminar: Multiverse Analysis of Linguistic and Contextual Bias in Large Language Models

About the seminar presentation

Registration

About the speaker

Cookie Consent