21 January 2025
I am a postdoctoral researcher at the Amsterdam School of Communication Research, focusing on computational social science. My work involves developing frameworks to integrate data science tools into social science research. Recently, I evaluated linguistic and contextual bias in large language models. I tested the performance of vision-language models for extracting theoretical concepts from social media data. I also study social movements and nationalism through social media analysis using large language models.
Coming from a political science and sociology background where party manifestos and legislative texts are easily accessible, I was surprised to find how challenging it is to access news data in communication science due to copyright restrictions. Many databases, like LexisNexis, are costly and restrict automated batch downloading. Scraping the web for articles from news outlets is also a difficult and time consuming task. To address this, I initiated a project to curate a global dataset of news articles, using open-access web crawl data from Common Crawl. Our pilot currently covers 16 countries, with plans to expand to 90. We're working on securing funding to scale and enhance the project further.
Data science evolves rapidly, staying updated is challenging. What I particularly enjoy about the DSC is the opportunity to connect with colleagues and exchange experiences on the newest methods and techniques. The interdisciplinary nature of the community exposes me to ideas and tools I wouldn't encounter otherwise.
I have a love-hate relationship with generative AI. It’s an incredibly powerful and versatile tool when used correctly, but I’m cautious about relying on it for everything. I'm currently working on a project that compares major generative AI models, weighing performance, reproducibility, and openness.
I first worked with R, but now use Python more frequently. Both have their strengths—R excels in statistical tests and visualization (I swear by ggplot!), while Python offers better scalability and is a slightly more stable when building apps. My choice often depends on which tool best fits the task at hand.