Work Experience
Research Assistant
08.2021 - Present
Darmstadt University of Technology     
BMBF-funded project
InsightsNet, emphasizing academic publications knowledge mining.
Machine Learning Intern
04.2020 - 10.2020
Robert Bosch GmbH     
Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.
Publications
Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown
Changxu Duan,
the 19th International Conference on Document Analysis and Recognition (ICDAR 2025)
[GitHub]
Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation
Changxu Duan,
the 30th Annual International Conference on Natural Language & Information Systems (NLDB 2025)
[GitHub]
Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment
Changxu Duan,
the Communications of the TeX Users Group (TUGboat) 45:2, 2024.
[Paper]
[Slides]
LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework
Changxu Duan, Zhiyin Tan, Sabine Bartsch,
the second Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023.
[Paper]
[GitHub]
Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora
Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch,
the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023.
[Paper]
The InsightsNet Climate Change Corpus (ICCC)
Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille,
Journal of Datenbank-Spektrum, 2023.
[Paper]
Projects & Tools
Explaining the idea of language model generation through Jabber and entity linking
Poster presentation at Machine Learning Operations Summer School 2022
This work addresses the hallucinatory and imprecise nature of language model generation by using controlled generation (“babbling”) on the same problem and validating outputs through entity linking to a knowledge graph.
Semi-supervised Event-centered Emotion Analysis and Performance Prediction
Master Thesis at Robert Bosch GmbH
This work explores event-centered emotion analysis in semi-supervised learning by designing a similarity-based method for unlabeled data selection and clustering, and identifying key task attributes for performance prediction.
SPARQL Auto-Completion (VS Code)
Developed a VS Code extension offering SPARQL language support through automatic prefix completion and IntelliSense suggestions for classes and properties.
[GitHub]