8th ANNUAL COMPTEXT Conference 2026
Workshop Program on April 23, 2026

Morning

Introduction to Python for Data Management and Analysis

Lecturer: Allison Koh

Many researchers already use tools like R and Stata for data analysis. However, most of the cutting-edge work in machine learning and natural language processing happens in Python. This workshop offers a practical introduction to programming and data management in Python, designed for researchers who are familiar with applied data science methods but new to Python. Participants will learn the building blocks needed to navigate Python syntax, adapt existing code in other programming languages like R, and begin integrating Python into their workflows for data management and analysis. The skills covered here also serve as a foundation for working with unstructured and multimodal data sources.

Using AmCAT as a collaboration environment for discovering, accessing, analysing and (re)using textual data

Lecturer: Sofia Gil-Clavel

A key challenge to open science practices, such as sharing and reusing data, is that many researchers consider it a burdensome afterthought rather than a core part of the research itself. AmCAT (Amsterdam Content Analysis Toolkit) addresses these needs by providing an open collaboration environment. As collective benefit, AmCAT offers an open source and decentralized solution for storing and analysing potentially sensitive documents, decreasing dependence on commercial or foreign providers and their terms of use throughout the research life cycle. It also facilitates non-consumptive research when full data access is impossible. Combining a user-friendly GUI with a powerful API, AmCAT ensures researchers with varying computational skills can participate. During this workshop, participants will learn to use AmCAT to discover, access, analyse, and (re)using textual data. It is necessary that participants bring their own laptops.

Building Shared Frameworks for GenAI Integration in the Social Sciences

Lecturer: Paulina Garcia-Corral

The landscape of content analysis (computational and otherwise) has fundamentally shifted. Once a niche domain requiring specialized technical expertise, computational content analysis has become broadly accessible with the emergence of Large Language Models (LLMs) and Large Multimodal Models (LMMs). These tools enable scalable, automated analysis of text, images, audio, and other forms of data, while significantly lowering the technical barriers to large-scale research. This democratization has led researchers across disciplines and methodological traditions to integrate LLMs into their workflows. Increasingly, scholars seek to move beyond copy-paste interactions with web applications toward systematic, transparent, and reproducible approaches. As these communities of practice expand, the need for shared vocabularies, pedagogical frameworks, and cross-disciplinary best practices becomes urgent. This workshop brings together researchers and educators at a critical juncture: How can we effectively teach, evaluate, and integrate LLM-based methods for audiences with no or limited programming experience while maintaining methodological rigor and reproducibility

Key Questions that we will address:

  • How can we communicate LLM capabilities and limitations across varying levels of technical expertise?
  • Which pedagogical frameworks are most effective for teaching LLM applications to non-technical audiences?
  • What constitutes best practice for systematic LLM integration without requiring programming?
  • How can we ensure responsible, transparent, and reproducible use of these tools in research contexts?

DSA Data Access 101 – What the Digital Services Act means for Researcher Access to Online Platforms

Lecturers: LK Seiling, Jakob Ohme

As online platforms, and especially social media, have taken an increasingly important role for communication across small communities as well as the globe, researcher access to platform data has also grown in relevance. However, researchers had to rely on personal connections to platform providers or their good will in order to make use of the vast amount of data transmitted through and collected by these intermediary services. Most recently, the European Union’s Digital Services Act (DSA) has established a new legal basis for data access: Article 40 DSA requires platforms to provide both public and non-public data to researchers – as long as they meet a set of criteria and study systemic risk.

This workshop aims to introduce participants to the research potentials unlocked through this regulation and provide them with everything they need to know to start drafting their own access requests. To that end, it will introduce the context of the DSA and Article 40, including recent enforcement actions by the European Commission, and map the data access pathways set out in the DSA, as well as some of the ongoing monitoring and advocacy efforts around data access. Based on an honest assessment of the current challenges participants will learn how to anticipate and address them in order to effectively use the new framework to their benefit – and start submitting their own data access applications.

From Desktop to Cluster: Scaling Computational Social Science with High-Performance Computing

Lecturer: Andreas Küpfer

Computational social science increasingly relies on large-scale data and resource-intensive models embedded in complex research pipelines. These requirements often exceed the limits of standard desktop computing. This beginner-friendly workshop introduces high-performance computing (HPC) as a practical research infrastructure for social scientists. Participants will learn the difference between CPU and GPU computing and how to navigate an HPC system. They will then set up, submit, and monitor computation scripts using a job scheduler. The workshop concludes with an outlook on workflow managers, demonstrating how automated pipelines can be utilised to structure, extend, and robustly execute computational social science analyses.

Afternoon

AI-Powered Qualitative Analysis in R

Lecturer: Kenneth Benoit

This hands-on workshop introduces AI-assisted qualitative analysis in R, guiding participants through flexible, accessible workflows to annotate and code text, images, audio, PDFs, and tabular data with minimal programming. Participants will learn to connect to large language models such as ChatGPT, Claude, Gemini, and open-source options via Ollama, as well as design reliable prompts, build reproducible pipelines, validate and document analytic decisions, and apply ethical data-handling practices. By integrating these AI tools into standard qualitative research workflows, the workshop empowers researchers to enhance coding, thematic analysis, and interpretation while maintaining transparency and rigor for publication-ready outputs.

Introduction to Automated Analysis of Multimodal Data

Lecturer: Nadezhda Ozornina 

Multimodal content, combining text, images, and audio, is becoming increasingly central to social media; however, computational methods for analyzing such data remain underexplored. This beginner-friendly workshop introduces participants to key strategies for integrating and jointly analyzing multiple modalities, covering a variety of methods, including topic modeling and clustering. The session includes hands-on exercises requiring basic Python knowledge, along with discussions of theoretical frameworks and validation strategies for applying these methods in social science research.

From Appendix to Spotlight: Validation Standards in Text-as-Data Political Science

Lecturers: Christopher Klamm and Steffen Bastián

Our proposed workshop would provide (1) an overview of current validation challenges, tools, and practices in text-as-data research and (2) an open discussion space focused on identifying the needs of social scientists, potential standards, and best practices moving forward. The goal is not to prescribe a single solution, but to foster shared understanding and community-driven reflection on validation in CSS.

From Concept to Computation: Identifying Group Appeals in Text

Lecturers: Alona Dolinsky and Dylan Paltra

Political actors and institutions frequently employ social groups in their communication, but systematically identifying such appeals is conceptually and methodologically challenging. This workshop introduces participants to the study of group appeals in public discourse via computational text analysis. Participants will learn the fundamentals for creating an annotated group-appeals training corpus and the computational approaches currently used to scale up detection across large corpora using R and Python. This will enable them to refine existing models and systematically apply established methods to their own research. Drawing on the Parties’ Social Group Appeals (PSoGA) framework, the workshop will emphasize the importance of validation at each stage. Participants will also conduct a practical exercise to measure group appeals in real-world texts.

Biased by design? Unpacking social bias in Computational Social Science

Lecturer: Ahrabhi Kathirgamalingam

Social bias is not peripheral to Computational Social Science; it is deeply embedded in both the field and its methodology. Drawing on recent literature and empirical work, this workshop explores how social bias shapes CSS research and how CSS methods can, in turn, help detect and critically examine social bias. Through an interactive format, we will first unpack the complex relationships between social bias and CSS. Participants will then reflect on the role of social bias in their own research contexts and projects. Together, using our own projects as starting points, we will develop strategies for integrating critical reflection, methodological pluralism, and innovation in ways that promote more rigorous and equitable CSS research.

During the event, photographs, video recordings, and audio recordings may be taken to document key moments, including interactions during workshops, roundtable discussions, presentations, coffee breaks, lunch breaks, and other informal conversations or meetings, such as networking sessions or spontaneous discussions between participants. These recordings may be used by the organisers for promotional purposes, including publication on official websites, social media channels, and in printed materials such as reports, posters, and brochures.

By registering for this event, participants explicitly consent to being recorded, and such agreement is a requirement for participation. Please note that images and recordings shared on public platforms may be reshared or republished by third parties, limiting the organisers’ ability to fully exercise participants’ Right to Erasure as set out in Article 17 of the General Data Protection Regulation (Regulation (EU) 2016/679, GDPR).