19 October 2023
'More and more of our cultural output comes in the form of digital data, so there are many new applications of data science methods in the humanities. I often work with natural language processing methods that help us get meaning out of a text, e.g. to be able to study the meaning of terms as used by a particular philosopher.'
'Last year my student Lizzy Brans and I set up a survey to gather word similarity judgements from native speakers of Dutch. Such datasets are used to evaluate and calibrate large language models in English, German and other languages, but none existed yet for the Dutch language. Last month, we presented the results at the Dutch computational linguistics conference. As someone who worked in Dutch linguistics, I am happy to help keep the language technology for our small language up to the same standards as that of the major languages!'
'It is a great way for connecting with people from other departments and faculties with similar interests. Many different fields deal with textual data and it is interesting to learn how it is done elsewhere. I would never have ended up collaborating with a political scientist without the DSC!'
'Natural Language Processing (NLP) techniques are of course essential for extracting anything but the most superficial of information from text. Within that area, I like any method that I can evaluate and adapt to the relevant context myself! It is good to be critical of the methods that you use.'
'Why choose? I have taught courses on both! Each has its strengths and weaknesses. Personally I prefer using R for inferential statistics and certain statistical things such as computing inter-annotator agreement, and Python for machine learning, language modeling and most other things.'