Work Experience
Research Assistant
08.2021 - 2024.12
Technical University of Darmstadt     
BMBF-funded project
InsightsNet, emphasizing academic publications knowledge mining.
Machine Learning Intern and Master Thesis Student
04.2020 - 04.2021
Robert Bosch GmbH     
Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.
Publications
Beyond Catalogue Counts: Quantifying Visibility Bias in Low-Resource Multilingual NLP
[Under review]
Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts
Zhiyin Tan*, Changxu Duan*, (*co-first)
the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2025)
[Paper]
[Slides]
Semantically Orthogonal Framework for Citation Classification: Disentangling Intent and Content
Changxu Duan*, Zhiyin Tan*, (*co-first)
the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025)
[GitHub]
[Paper]
[Slides]
Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown
Changxu Duan,
the 19th International Conference on Document Analysis and Recognition (ICDAR 2025)
[GitHub]
[Paper]
[arXiv]
Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation
Changxu Duan,
the 30th Annual International Conference on Natural Language & Information Systems (NLDB 2025)
[GitHub]
[Paper]
[arXiv]
Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment
Changxu Duan,
the Communications of the TeX Users Group (TUGboat) 45:2, 2024.
[Paper]
LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework
Changxu Duan, Zhiyin Tan, Sabine Bartsch,
the Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023.
[Paper]
[GitHub]
Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora
Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch,
the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023.
[Paper]
The InsightsNet Climate Change Corpus (ICCC)
Elena Volkanovska, Sherry Tan, Changxu Duan, Sabine Bartsch, Wolfgang Stille,
Journal of Datenbank-Spektrum, 2023.
[Paper]
Projects & Tools
Explaining the idea of language model generation through Jabber and entity linking
Poster presentation at Machine Learning Operations Summer School 2022
This work addresses the hallucinatory and imprecise nature of language model generation by using controlled generation (“babbling”) on the same problem and validating outputs through entity linking to a knowledge graph.
SPARQL Auto-Completion (VS Code)
Developed a VS Code extension offering SPARQL language support through automatic prefix completion and IntelliSense suggestions for classes and properties.
[GitHub]