Christian Arnold: Introduction to Deep Learning

This workshop introduces researchers to the fundamentals of deep learning. The first part of  the workshop is devoted to explaining basic mathematics of a fully connected neural network. The second part of the workshop walks participants through an hands-on example of deep learning using Keras. The workshop also surveys resources available for researchers interested in getting started with deep learning in their own work.

Nicolai Berk: Supervised machine learning with imbalanced data

Many researchers employing supervised machine learning will handle highly imbalanced training data. Detecting hate speech online, migration content in news coverage, or predicting the likelihood of war in a country are all classification problems with a small minority class. This workshop will cover different techniques that deal with this issue: stratified random sampling based on keyword frequencies, synthetic minority oversampling technique, and active learning. We will briefly discuss the logic behind these techniques, but ultimately focus on the implementation in Python based on real data. Following this workshop, attendees will have been introduced to different solutions when training supervised models on imbalanced data and be able to implement them using Python.

Akos Mate: Data Visualization in R

The applied data visualization workshop will provide an accessible introduction to creating informative and engaging visualizations. We will use R’s ggplot2 package during the workshop, so some knowledge of R is assumed (but not necessary). The materials will cover a wide range of visualizations: distributions, proportions, associations between variables, time series and if time permits geospatial data and networks as well.

Hauke Licht: Multilingual supervised text classification

This workshop first introduces participants to a set of approaches that are commonly applied to handle multilingual corpora, discussing their respective strengths and weaknesses. The workshop then walks through an applied example covering free machine translation, multilingual sentence and word embeddings, and – time permitting – multilingual Transformers.

Natalia Umansky: Collecting and Analyzing Twitter Data

This hands-on workshop will focus on collecting, cleaning, tidying, and analysing data from Twitter using the software environment R. Attendees will be equipped to deal with the challenges of working with Twitter data, including data collection biases, access constraints, data storage, and noisy text. To this purpose, the workshop will include both theoretical discussions and guided practical exercises. Attendees are welcome to come to the workshop with their own research questions and will be aided in adapting the exercises accordingly.

Christian Pipal: Joint Estimation of Sentiment and Topics in Textual Data

How can we separate what is said from how it is said in text? Topics and sentiment are often conflated, and some words might indicate a certain emotion in one context, but not in another. Joint sentiment-topic models offer a solution as they estimate topics and sentiment simultaneously. This tutorial gives an introduction to the ‘sentitopics’ r-package as a solution to the practical problems one encounters when measuring topics and sentiments in text.