Confirmed Contributions

TaCoS 2026 in Heidelberg

Talks

Workshops

Lightning Talks

Total

Talk (30min) (15)

Kodierte Keilschriftkurse: Eine Lern-App für Keilschriftzeichen

by Anna Fischer

Talk (30min)

In diesem Vortrag möchte ich der Fragestellung nachgehen, wie eine Lern-App für sumerisch-akkadische Keilschrift aussehen kann. Diese Schrift ist eine logosyllabische, derer sich mehrere Völker bedient haben. Sie wurde im späten vierten Jahrtausend in Mesopotamien entwickelt und war von dort über 3000 Jahre in Gebrauch. Ziel meiner App soll es sein, ein Mittel zu bieten, mit dem sich ihre Lesungen und Zeichenformen über verschiedene Epochen hinweg spielerisch lernen lassen. Zusammensetzen wird die App sich aus zwei Komponenten: einer Sammlung an Zeichen mit ihren jeweiligen Entwicklungsstufen, Lautwerten und Logorammen, aus denen man sich Kartensätze zusammenstellen kann; und einer Reihe an Lernspielen, die das Verinnerlichen dieser Zeichen unterstützen. Ich verwende hierfür React Native, ein Framework für JavaScript. Als Datengrundlage dient die Oracc Sign List [1]. Die Anwendung umfasst bereits zwei Lernspiele: ein Verbindungs- und ein Lernspiel. Beginnen möchte ich den Vortrag mit einer kurzen Einführung in die Keilschrift, deren Besonderheiten und welche Vorteile aus einer speziell für sie entwickelte App gezogen werden können. Daraufhin werde ich den aktuellen Stand der Anwendung vorstellen. Abschließend beschäftigte ich mich damit, welche computerlinguistischen Methoden zur Lernunterstützung in der App eingebaut werden können. [1] https://oracc.museum.upenn.edu/osl/

Submitted on February 10, 2026

A statistical analysis of translation errors in machine translations.

by Cristina Gutiérrez Alcalá

Talk (30min)

This research examines if machine translations, even with notable progress recently, can provide the same level of exactness and correctness as human translators. AI tools like Google Translate, DeepL or ChatGPT are appreciated for their quickness and convenience, but they usually do not perform well in important aspects such as context related adaptation, cultural integration and style adjustment. This work suggests that these flaws underline the essential function of human translators while presenting Machine Translation more as an assisting tool instead of a replacement one. The research looks at different translations that are made by many tools for varied types of texts, such as technical, literary and legal ones. It finds usual mistakes - wrong use of terms, changes in style consistency, differences in culture among others - and checks their effect on proper comprehension of specific content. The outcomes were then compared with known translation rules to show the continuing gaps found in translations done by AI-based systems. Through examining the disparity across platforms and recognizing the mistakes repeating themselves, this dissertation seeks to present a detailed view on the boundaries of Machine Translation. The results endorse suggestions for using Machine Translation as an addition tool, highlighting the necessity of human supervision to guarantee accuracy, exactness, and relevance in culture when working on translations.

Submitted on February 17, 2026

Social Impact of AI

by Maya Arseven, Polina Kuznetcova

Talk (30min)

Are we, as computational linguists, doomed in the job market? Are we all going to be replaced by AI? Or, on the contrary, are we the luckiest generation of all? In this talk, we will explore how AI impacts the balance between creating new jobs and replacing existing ones. From recruiting to AI usage in the workplace, we will discuss the pros and cons of integrating AI into our daily work lives. Finally, we will look at how AI replaceability is calculated and what metrics are used to measure how AI-proof a job is.

Submitted on March 22, 2026

Graph Language Models

by Moritz Plenz

Talk (30min)

While Language Models (LMs) are the workhorses of NLP, their interplay with structured knowledge graphs (KGs) is still actively researched. Current methods for encoding such graphs typically either (i) linearize them for embedding with LMs – which underutilize structural information, or (ii) use Graph Neural Networks (GNNs) to preserve the graph structure – but GNNs cannot represent text features as well as pretrained LMs. In our work we introduce a novel LM type, the Graph Language Model (GLM), that integrates the strengths of both approaches and mitigates their weaknesses. The GLM parameters are initialized from a pretrained LM to enhance understanding of individual graph concepts and triplets. Simultaneously, we design the GLM’s architecture to incorporate graph biases, thereby promoting effective knowledge distribution within the graph. This enables GLMs to process graphs, texts, and interleaved inputs of both. Empirical evaluations on relation classification tasks show that GLM embeddings surpass both LM- and GNN-based baselines in supervised and zero-shot setting, demonstrating their versatility.

Submitted on March 23, 2026

A Dual-Track Theory of Local Discourse Coherence in Humans and Language Models

by Souvik Banerjee

Talk (30min)

What makes a sequence of sentences feel coherent rather than disjointed? At the local level, psycholinguistic and computational evidence points to two distinct mechanisms operating simultaneously. Relational coherence tracks how adjacent sentences are logically connected: whether one explains, contrasts, or elaborates on the other. Entity coherence tracks which referents are in focus and how attention shifts between them. These two tracks are not independent: the type of relation between sentences shapes which entities stay in focus, and the entity configuration constrains which relations are plausible. Despite decades of work on each track in isolation, we lack a unified formal account of how they interact. We develop an information-theoretic framework that models both tracks jointly. At each word in a text, a reader maintains and updates beliefs about the discourse relation currently being constructed. We validate this model against human reading times. We then ask whether LLMs implement a similar dual-track structure internally using targeted interventions on model activations. We test whether relational and entity coherence are encoded in separable regions of the model's representation space, and whether they interact in the ways our theory predicts. This lets us ask: do LLMs solve discourse coherence the same way humans do, or do they find a fundamentally different solution?

Submitted on March 25, 2026

Can Social Interaction Help Ground AI Storytelling Ability?

by Long Le

Talk (30min)

Current methods in the automatic storytelling task often pay more attention to the plot structure aspect of stories, while neglecting the problem of representing character emotional development. From how emotion progression in stories generated by language models differs from that in human-written stories, we suggest that current methods are not enough to mitigate this difference. Through a narratology-informed perspective, we propose to solve this problem by learning from an interactive environment that can ground language models to human emotional experience. We show that behaviour cloning and multi-turn reinforcement learning in this dynamic setting can help models generate more human-like stories while still maintaining generalization performance.

Submitted on March 27, 2026

Embedding Collapse in Two-Tower News Recommendation

by Martin Scheuermann

Talk (30min)

News recommendation is characterized by rapidly evolving content and highly skewed interaction data, making it challenging to learn robust user and article representations. While two-tower models are widely used for scalable retrieval, we show that they exhibit a critical failure mode in this setting: embedding collapse. We demonstrate that user and article embeddings quickly degenerate into a low-dimensional structure, limiting retrieval performance and leading to an overemphasis on popular content. This collapse emerges early during training and constrains downstream performance. To mitigate this issue, we propose and analyze spectral regularization, which encourages a more balanced use of the embedding space by preventing variance from concentrating in a few dominant directions. This leads to more stable training and improved recommendation quality.

Submitted on March 27, 2026

«John dropped the apple — Which apple?» Investigating LLM Reasoning with Attention Scores

by Bohdana Ivakhnenko; Kim Lingemann; Raziye Sari; Asma Motmem

Talk (30min)

With the continuous struggle to increase LLMs' reasoning performance, it has become apparent that true language understanding is a necessity, not a whim. Though, what does that look like exactly? The transformer attention mechanism is certainly central to it. But does that alone suffice to equip models with real reasoning capabilities? Building on recent studies of chain-of-thought (CoT) reasoning, we are on a mission to analyse why models fail at trivial tasks. Raw neuron attribution scores guide us to “where John really is”. While searching for clear answers, we are not once, not twice, met by surprising discoveries… We invite you to join us on this journey of revealing problems hidden in plain sight, clever design strategies to gauge the limits of Llama models, and astonishing observations of how the models outsmarted us—rarely expected but sure to become an apparent reality of today's LLMs. Get ready for proper research with standard deviation, deep analyses, and many visual examples.

Submitted on March 31, 2026

Beyond Names: Tracing the Stylistic Influence of Canonical Texts in Fanfiction

by Carina Zander, Denise Binz

Talk (30min)

This talk explores the extent to which canonical texts shape writing style in fanfiction. Drawing on a corpus of fanfictions from popular fandoms on Archive of Our Own, we investigate whether distinct stylistic traces persist beyond explicit references. To this end, we construct and compare two versions of the corpus: the original fanfiction texts as written by their authors, and a modified version in which fandom-specific proper names are removed using named entity recognition techniques. Using machine learning classification, we assess how accurately texts can be attributed to their respective fandoms in both conditions. The comparison allows us to isolate the contribution of implicit stylistic features from that of overt lexical markers. The results provide insight into how deeply source material informs fanfiction writing, highlighting the extent to which fandom-specific style operates independently of identifiable names and entities.

Submitted on March 31, 2026

Do AI Models Judge Like Humans? Generalization in German Ditransitive Constructions Across Native Speakers, L2 Learners, and Large Language Models

by Xinyu Jing

Talk (30min)

Ziel der vorliegenden Studie ist es, die Generalisierungsfähigkeit fortgeschrittener Deutschlernender mit chinesischer Erstsprache im Bereich der ditransitiven Konstruktion zu untersuchen und dabei Modelle der künstlichen Intelligenz als Vergleichsinstanz einzubeziehen, um Unterschiede zwischen menschlichen und maschinellen Urteilen hinsichtlich syntaktischer und semantischer Akzeptabilität zu analysieren. Zu diesem Zweck wurde ein Akzeptabilitätsurteilsexperiment durchgeführt, an dem deutsche Muttersprachler, fortgeschrittene Deutschlernende mit chinesischer Erstsprache sowie zwei große Sprachmodelle (DeepSeek und ChatGPT) teilnahmen, die eine Reihe von grammatisch konformen und nicht konformen ditransitiven Sätzen bewerten sollten. Das Versuchsmaterial wurde systematisch im Hinblick auf Verbklassen und die Kompatibilität zwischen Verb und Konstruktion variiert und umfasste sowohl prototypische ditransitive Verben als auch randständige Verben sowie Konstruktionen, die gegen syntaktische oder semantische Beschränkungen verstoßen. Durch den Vergleich der Bewertungen der drei Gruppen werden Fehlermuster in der Generalisierung bei Zweitsprachlern identifiziert und Unterschiede zu Muttersprachlern aufgezeigt. Gleichzeitig wird die Leistungsfähigkeit künstlicher Intelligenz in solchen sprachlichen Urteilsaufgaben evaluiert. Die Ergebnisse lassen erwarten, dass Deutschlernende mit chinesischer Erstsprache bei der Bewertung nicht konformer Konstruktionen toleranter sind als Muttersprachler und eine Tendenz zur Übergeneralisierung zeigen. Die Sprachmodelle weisen insgesamt ein ähnliches Bewertungsmuster wie Muttersprachler auf, zeigen jedoch unter bestimmten semantischen Bedingungen Instabilitäten bzw. eine stärkere Orientierung an oberflächlichen Strukturen.

Submitted on March 31, 2026

Protein LLMs: Applying NLP Methods to Protein Sequence Modeling

by Elizaveta Dovedova, Artem Stetoi

Talk (30min)

Have you heard of protein bars? Protein coffee? Protein nachos? How about protein LLMs? Proteins are linear chains of amino acids that can be represented as sequences over a fixed alphabet. This makes them resemble natural language in a surprisingly useful way, and many NLP algorithms are directly applicable to modeling protein sequences. Capturing patterns in amino-acid sequences and linking those patterns to biological properties lets you predict protein behavior, infer their structure and function, and design them for specific applications—a computational task that would otherwise require extensive laboratory work. Starting with n-grams, the application of methods original to NLP in computational biology has been studied for many years, leading to the recent emergence of protein-specific large language models (Protein LLMs). With my presentation, I would to introduce coli students with no biological background to how their knowledge of NLP concepts and techniques—such as embeddings, transformers or masked language modeling—can be applied in the biological field.

Submitted on March 31, 2026

Is Computational Linguistics dead?--An exploratory survey on CL job market across China and Europe

by Ran Li

Talk (30min)

This talk presents a small exploratory study of the current job market for computational linguistics and AI-related positions across China and Europe. Instead of focusing on a single company or sector, the project aims to collect and analyze publicly available job postings using web scraping methods. The goal is to gain a data-driven overview of where relevant positions are located, what skills are most frequently required, and how different roles are described. After collecting job descriptions from major recruitment platforms(LinkedIn, Boss， 51 Jobs), major AI companies(Open AI, Google, Tencent, ByteDance, Baidu, Alibaba) and traditional enterprises seeking digital transformation, the data will be cleaned and analyzed to identify patterns in required competencies, such as NLP techniques, machine learning frameworks, LLM-related tasks, and engineering skills. Positions will be roughly categorized into several types, including data-focused roles, model training and fine-tuning, evaluation and alignment, and product-oriented NLP applications. The results will be visualized as a map showing geographical distribution, along with charts summarizing skill demand and role characteristics. This project is both a methodological exercise in simple large-scale text collection and analysis, and a practical attempt to better understand the landscape of career paths in computational linguistics. The talk will share preliminary findings and reflect on how job descriptions shape our understanding of the field.

Submitted on April 1, 2026

Zwischen KI und Kundenprojekten: Computerlinguistik in der Praxis

by Sophia Ackermann

Talk (30min)

Was machen Computerlinguist:innen eigentlich außerhalb der Forschung? Dieser Vortrag bietet Einblicke in den Arbeitsalltag bei berns language consulting, einem Düsseldorfer Expertenunternehmen für Sprachprozesse und -technologie. Aus erster Hand berichtet eine Computerlinguistin, wie vielfältig der Beruf im Unternehmenskontext ist: von der Analyse und Aufbereitung von Sprachdaten über die Evaluierung und Einführung von Tools bis hin zur Beratung von Kunden und der Implementierung maßgeschneiderter Lösungen. Anhand anonymisierter Beispiele aus realen Industrieprojekten wird veranschaulicht, wie Computerlinguistik konkret eingesetzt wird und welchen Mehrwert sie in unterschiedlichen Anwendungsfeldern schafft. Gleichzeitig werden aktuelle Entwicklungen und Herausforderungen im Markt aufgezeigt, insbesondere rund um KI, Large Language Models und die wachsende Bedeutung hochwertiger Sprachdaten. Der Vortrag richtet sich an Studierende, die sich für praxisnahe Einblicke und berufliche Perspektiven in der Computerlinguistik interessieren.

Submitted on April 8, 2026

The Levenshteins: A family of good ol' transparent methods applied to human language learning of German orthography (en)

by Nicolas Arnold

Talk (30min)

In computational linguistics, we enable computers to perform tasks in natural language processing and generation. Using such methods and models in the context of _human_ language acquisition comes with its own set of challenges. In my interdisciplinary bachelor's thesis, I applied methods from computational linguistics to the didactical setting, modelling the phonetic plausibility of _spelling errors_ by early language learners. The idea is quite simple: Given a misspelled word, let's predict its pronunciation and find candidate words in the lexicon that sound similar. What could possibly go wrong? I used openly available tools like *BAS Web Services* [1] and *epitran* [2] for grapheme-to-phoneme (G2P) conversion, *PHOIBLE* [3] for phoneme inventories and phonetic features and took spelling errors from the *Litkey-Corpus* [4] for evaluation. I will point out some challenges between *linguistic theory*, *language acquisition research* and the applied contexts of *language teaching/learning* and show how they affect the perspective from *computational linguistics*. Furthermore, I will present problems and surprising failures of existing methods while answering some critical questions: - Does grapheme-based spelling correction really outperform phoneme-based ones? - Is there still a place for simple algorithms like Levenshtein distance, heuristics and rule-based approaches, or even linguistic theory in modern computational linguistics? - As learner language does not follow all the rules. Will the prediction of phonemes still work in the case of _misspelled_ words? All in all, are we ready for learner language yet? _The spelling examples are in German. However, German skills are not needed to follow the talk. The talk is held in English._ [1]: https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/Grapheme2Phoneme [2]: https://github.com/dmort27/epitran [3]: https://phoible.org/ [4]: https://www.linguistics.rub.de/litkeycorpus/

Submitted on April 26, 2026

The Unbelievable Universe of Unicode

by Jakob Moser

Talk (30min)

Whenever a computer needs to store or process text, it represents it as a sequence of numbers. But how does that work? This is where Unicode comes in. Given a character, Unicode encodes it into a number. The Unicode Consortium has an ambitious goal: It wants to be able to represent any character or sign there is in any writing system, making Unicode a truly universal standard. But with such universality also come a few seemingly unbelievable things: - Why is `len("👨🏾‍💻") == 4`? - Why is uni-heidelberg.de not the same as uni‐heiⅾelberg.de? - ‮Why does this text appear backwards? - And why is `"á" != "á"`? This talk is an assortment of Unicode-related trivia, which might help you during your next project, or at least help you to gain a deeper appreciation for the complexity behind the seemingly mundane.

Submitted on May 19, 2026

Workshop (1h) (3)

Programming Language that Functions – Haskell for Computational Linguists

by Pranav Singh

Workshop (1h)

I'm sick of Python and I know you are too. Wipe your memory of GPUs and stop Hugging Faces – instead, pay your attention towards functional programming: an explainable and elegant way to model structures and interactions in language. In this workshop, we'll learn the basics of Haskell, a language embraced by computational linguistics for years, and use it to model real linguistic phenomena (focusing on syntax, morphology, and phonology). For linguists, this workshop will serve as practice in breaking down observed phenomena into their implementation details, designing the right data structures, and manipulating them with the right algorithms. For computer scientists, this workshop will build the intuition to think functionally (foundational to other functional languages like OCaml and Scala) and add a new programming paradigm to your toolbox. All code will be available on GitHub so that attendees can build these concepts further and use them for their own pet projects :)

Submitted on February 9, 2026

Uncovering Political Rhetoric with Corpus Tools: A Hands-On Introduction to Skelog and ParlaMint

by Isabell Furkert

Workshop (1h)

Parliamentary debates are a good resource for studying political discourse - but analysing thousands of speeches by hand is impossible. This workshop serves as a hands-on introduction to Skelog, a free web-based corpus analysis platform provided by CLARIN, and a look into the ParlaMint dataset, a large, metadata-rich collection of parliamentary proceedings from 29 European parliaments. Together, these tools enable new possibilities for empirical, reproducible research into political language. Participants will receive a guided walthrough of Skelog's core features - subcorpus creation and keyword analysis - illustrated with concrete examples. No prior experience with corpus linguistics or programming required.

Submitted on March 29, 2026

The cost of cutting corners

by Mariia Neguliaeva

Workshop (1h)

Real-life computational projects have a way of failing at the worst possible moment. This workshop stems from one such project on sign language translation, which sits at an intersection of linguistics, computer vision and accessibility technology - a real problem with real stakes, but most importantly a real appetite for computational resources. The gap between the project's ambitious beginning and considerably humbler reality is where the real work happens, as the value of defensive coding gradually replaces hopeful optimism. We'll explore that gap together through an interactive game (no spoilers!) and then discuss some tips on making large-scale computing bearable. What does it take to survive overnight crashes, while staying within budget? And how come we don't always do it? This session is for you, if you've ever been too scared to waste scarce resources or wondered why things go wrong when you seem to have done everything right. No prior experience with large-scale computing or sign language translation required. Curiosity and pent-up frustration welcome. Not validated by controlled studies, individual results may vary ;)

Submitted on March 31, 2026

Lightning Talk (5-10min) (5)

Adversarial Limits of Slavic Automatic Speech Recognition

by Ekaterina Voronina

Lightning Talk (5-10min)

The integration of automatic speech recognition (ASR) into daily life makes speech privacy against adversarial attacks a critical concern. This research investigates the robustness of the Whisper Base model across the Slavic language family compared to English. Our results show that vulnerability is task-dependent. In simple phrase-forcing attacks, high-resource languages like English and Russian are the easiest to subvert, with training volume being the primary predictor of success. However, in semantic modification attacks, such as numerical replacement, success depends more on grammatical complexity than on data volume. We also identify a “proximity effect”: while Belarusian’s closeness to Russian increases its vulnerability, more distant languages like Czech demonstrate significantly higher inherent robustness. Interestingly, additional fine-tuning in Russian did not significantly alter attack success for any language. These findings suggest that ASR security is shaped by a complex interplay of training resources and linguistic relatedness.

Submitted on March 6, 2026

Interslavic in mBERT: Language Family Alignment, Script Gap and LAFT

by Diana Pavlenok

Lightning Talk (5-10min)

Interslavic (ISV) is a planned language intended to be widely intelligible across Slavic languages and used online in both Latin and Cyrillic script. Because it is largely absent from standard NLP benchmarks, ISV offers an opportunity to conduct a practical test for how multilingual encoders represent an unseen but family-adjacent language. In our experiment, we probe ISV in multilingual BERT (mBERT) using a controlled lexical setup based on the PanLex Swadesh-207 concept list. From contextual embeddings, we compute layer-wise cosine distance matrices and track (i) how close ISV (Latin and Cyrillic variants) is to Slavic vs. non-Slavic languages and (ii) how large the Latin–Cyrillic script gap is across layers. We then apply language-adaptive fine-tuning (LAFT) on raw ISV text from Wikimedia Incubator and repeat the probe. Baseline mBERT already places ISV closer to Slavic languages than to non-Slavic families. After LAFT, both ISV variants become more Slavic-aligned, with the strongest shifts in mid layers. Script effects are strongly layer-dependent: the Latin–Cyrillic gap persists in lower layers but shrinks noticeably in higher layers after adaptation.

Submitted on March 16, 2026

Visualisation of English Collocational Patterns for Educational Purposes

by Koji Okumura

Lightning Talk (5-10min)

Collocation, a typical combination of words, is one of the areas that English learners often find challenging. This presentation provides an overview of my project to visualise collocational patterns for educational purposes. An existing application, such as #LancsBox, has a function for visualising collocation networks from a single corpus. However, this project aims to compare native speakers’ and learners’ corpora in order to identify patterns of collocational overuse and underuse by learners. It focuses on collocational patterns between verbs and object nouns. The nouns (or verbs) that co‑occur with a target verb (or noun) are visualised as a network, highlighting similarities and differences between native speakers and learners. In this network, edge thickness and edge length, as well as node position, encode statistical information derived from the corpora, including relative frequency and association measures for each collocational pattern. The visualisation will be implemented as a web application and is planned to include features for viewing example sentences and generating practice exercises. The pedagogical usefulness of this application will also be discussed.

Submitted on March 17, 2026

Swear-Word-Replacer

by Penelope Müller

Lightning Talk (5-10min)

Der Swear-Word-Replacer ist ein experimentelles textbasiertes Spiel (in deutscher Sprache), das sich humorvoll mit der Ersetzung beleidigender Adjektive auseinandersetzt. Ein feingetuntes deutsches GPT-2-Modell generiert Beispielsätze mit Fluchwort-Adjektiv-Nomen-Paaren. Die SpielerInnen schlagen alternative, positive Adjektive vor, die anschließend automatisch bewertet werden – anhand von phonetischer Ähnlichkeit, semantischer Plausibilität, syntaktischem Kontext und Sentimentwerten. Das Projekt wurde im Rahmen einer Modulabschlussprüfung (im Studiengang Computerlinguistik BA, an der HHU) entwickelt.

Submitted on March 24, 2026

Text Characteristics Reveal AI-Written News

by Anastasia Yablokova, Felix Thielen, Selina Gabric, Ahmed Mahmoud

Lightning Talk (5-10min)

The increasing use of large language models in news and media domain raises the question of how AI-generated news can be distinguished from human-written articles. In this project we investigate whether linguistic features can reveal systematic differences between AI-generated and human-written text. We collected a custom English news dataset from the BBC News Archive and the Cornell Newsroom corpus, then generated corresponding AI-written articles using open-weight language models. Using the Text Characterization Toolkit, we extracted linguistic features and tested them for statistical significance. Based on the significant features, we trained an XGBoost classifier and compared it to a fine-tuned RoBERTa baseline. Our results show that AI-generated and human-written news differ significantly across a range of stylistic and structural features. While RoBERTa achieved near-perfect results on in-domain news data, its performance dropped sharply on out-of-domain data. In contrast, the feature-based XGBoost model was more robust across datasets, suggesting that interpretable linguistic features may provide a stronger basis for generalizable AI-text detection.

Submitted on March 31, 2026