Changxu Duan

NLP Researcher

Education

PhD Student in Computaional Linguistics

2021 - Present

Technical University of Darmstadt

M.Sc. Computaional Linguistics

2018 - 2021

University of Stuttgart

B.Sc. Computer Sciences

2014 - 2018

Henan Normal University

Work Experience

Research Assistant

08.2021 - Present

Darmstadt University of Technology     

BMBF-funded project InsightsNet, emphasizing academic publications knowledge mining.

Machine Learning Intern

04.2020 - 10.2020

Robert Bosch GmbH     

Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.

Publications

Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

Changxu Duan, the 19th International Conference on Document Analysis and Recognition (ICDAR 2025) [GitHub]

Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation

Changxu Duan, the 30th Annual International Conference on Natural Language & Information Systems (NLDB 2025) [GitHub]

Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment

Changxu Duan, the Communications of the TeX Users Group (TUGboat) 45:2, 2024. [Paper] [Slides]

LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework

Changxu Duan, Zhiyin Tan, Sabine Bartsch, the second Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023. [Paper] [GitHub]

Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora

Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch, the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023. [Paper]

The InsightsNet Climate Change Corpus (ICCC)

Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille, Journal of Datenbank-Spektrum, 2023. [Paper]

Projects & Tools

Explaining the idea of language model generation through Jabber and entity linking

Poster presentation at Machine Learning Operations Summer School 2022

This work addresses the hallucinatory and imprecise nature of language model generation by using controlled generation (“babbling”) on the same problem and validating outputs through entity linking to a knowledge graph.

Semi-supervised Event-centered Emotion Analysis and Performance Prediction

Master Thesis at Robert Bosch GmbH

This work explores event-centered emotion analysis in semi-supervised learning by designing a similarity-based method for unlabeled data selection and clustering, and identifying key task attributes for performance prediction.

SPARQL Auto-Completion (VS Code)

Developed a VS Code extension offering SPARQL language support through automatic prefix completion and IntelliSense suggestions for classes and properties. [GitHub]

Languages & Awards

  • Chinese Native, English C1, German B1
  • First Award of Lan Qiao Algorithmic Competition in 2017; Bronze Medal of ACM-ICPC in Henan Province in 2017; First Award of Underwater Robot Competition in 2016; Third Award of RoboCup China Open in 2016.