25 March 2024
'My role spans across research in data management for data science, research software engineering, and data stewardship for the Informatics Institute. Every year, I also help with teaching Big Data, a course that is mandatory for the data science master’s track.'
'I gave a talk at a conference on my paper called “Automated Data Cleaning Can Hurt Fairness in Machine Learning-based Decision Making”, which shows that standard methods for data science practitioners to clean and prepare data before doing machine learning (which is a large part of what data scientists must spend their time on) can have unintended consequences for the fairness of the trained ML model if not carefully considered.'
At the Informatics Institute, there is so much innovation (and also hype) around AI and machine learning that it can be easy to lose sight of all the other meaningful and pioneering applications of data science. I value the occasional change of perspective from talking to DSC members in other disciplines and it’s always satisfying to zoom out and get the more global perspective of inter- and transdisciplinary work.
'We see a lot of big, beefy deep learning models making the news and winning AI competitions, so I really love to see an elegant old-school solution such as nearest neighbors methods for recommending retail or supermarket products. It feels like I’m rooting for the underdogs and it proves that sometimes smaller and simpler is better than expensive hardware and weeks of model training.'
'I often use (and enjoy) Python, but I am also keeping a close eye on the growing popularity of Rust. It’s not the most approachable language for beginners, but I think there is a lot of potential for making more performant and precise systems.'