Changxu Duan


I am currently a PhD Student at Technische Universität Darmstadt, contributing to the BMBF-funded InsightsNet project under the supervision of Dr. Sabine Bartsch. Our work focuses on developing innovative methods for mining knowledge from academic publications. Previously, I completed an M.Sc. in Computational Linguistics at the Universität Stuttgart, where I worked with Prof. Roman Klinger, and a B.Sc. in Computer Science at Henan Normal University, supervised by Prof. Zhan-ao Xue.

My research interests focus on knowledge mining, particularly extracting structured insights from unstructured data. I am also passionate about DevOps practices in the context of large language models.

Publications

ICDAR 2025
Long Paper
Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

Changxu Duan

[Web Page] [Code]

NLDB 2025
Long Paper
Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation

Changxu Duan

[Web Page] [Code]

TUG 2024
Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment

Changxu Duan

[Paper] [Slides]

AACLW 2023
LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework

Changxu Duan, Zhiyin Tan, Sabine Bartsch

[Paper] [Code]

KONVENSW 2023
Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora

Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch

[Paper]

Datenbank-Spektrum 2023
Journal
The InsightsNet Climate Change Corpus (ICCC)

Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille

[Paper]