Profile
I am an enthusiastic researcher bridging the realms of language and AI.
With a strong penchant for knowledge mining,
I excel in deriving structured insights from unstructured data.
My expertise extends to DevOps for Large Language Models (LLM),
showcasing my commitment to harnessing cutting-edge AI solutions.
Work Experience
Research Assistant
Darmstadt University of Technology      08.2021 - Present
BMBF-funded project
InsightsNet, emphasizing academic publications knowledge mining.
- Developed an annotation process pipeline for parsing these publications.
- Established cross-disciplinary connections based on the extracted content.
Machine Learning Intern
Robert Bosch GmbH      04.2020 - 10.2020
Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.
- Optimized data preprocessing, model tuning, stayed abreast of ML and time series trends.
- Partnered with cross-functional teams for insights and model refinement.
Publications
Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation
Changxu Duan,
under review
Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment
Changxu Duan,
The Communications of the TeX Users Group (TUGboat) 45:2,pp. 179-184 2024.
LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework
Changxu Duan, Zhiyin Tan, Sabine Bartsch,
In Proceedings of the second Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023.
Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora
Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch,
In Proceedings of the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023.
The InsightsNet Climate Change Corpus (ICCC)
Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille,
Journal of Datenbank-Spektrum, pp. 1610-1995. 2023.
Projects & Tools
Explaining the idea of language model generation through Jabber and entity linking
Poster presentation at Machine Learning Operations Summer School 2022
- The generation of language models is often hallucinatory and imprecise.
- Attempt to validate generation by controlling the language model to babble on the same problem, together with entity linking to knowledge graph to validate the output of the language model.
Semi-supervised Event-centered Emotion Analysis and Performance Prediction
Master Thesis at Robert Bosch GmbH
Investigated event-centered emotion analysis applicability in Semi-supervised Learning.
- Designed a method using similarity to select unlabeled data, manage clustering attributes.
- Identified key attributes of SSL Emotion Analysis tasks for performance prediction.
RainbowLaTeX Annotator
https://github.com/InsightsNet/texannotate
Transforms LaTeX codes into detailed PDF layouts, specifically for academic publications.
- LaTeX Compilation: transformed LaTeX to PDF with dockerized TexLive2023.
- LaTeX Annotation: applied color-labels to LaTeX elements for layout extraction.
- Data Extraction: extracted text & figures from PDFs, identifying both types and positions.
SPARQL Auto-Completion (VS Code)
https://github.com/Fireblossom/sparql-auto-completion
VS Code extension provides language support for SPARQL queries with two main functionalities.
- Auto-complete prefix: When typing a prefix (e.g., `foaf:`), it automatically completes the full prefix declaration (`PREFIX foaf: `) in the file header.
- IntelliSense for terms: Provides intelligent suggestions for classes and properties.