Changxu Duan

NLP Researcher

Contact

location

Darmstadt, Hessen, Germany

email

changxu.duan@foxmail.com

Education

PhD Student in Computaional Linguistics

Technical University of Darmstadt
2021 - Present

M.Sc. Computaional Linguistics

University of Stuttgart
2018 - 2021

B.Sc. Computer Sciences

Henan Normal University
2014 - 2018

Expertise

  • Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Large Language Models
  • Algorithms
  • DevOps
  • Data Annotation
  • Time Series Analysis

Skills

  • Python | Java | C | C++
  • Django | Flask | Streamlit
  • PyTorch | Huggingface
  • Spacy | NLTK | Stanza
  • Numpy | Scikit-learn
  • SQL | SPARQL
  • MySQL | ArangoDB
  • Git
  • Docker
  • Bash | PowerShell

Languages

  • Chinese Native
  • English C1
  • German B1

Awards

First Award

Lan Qiao Algorithmic Competition
2017

Bronze Medal

ACM-ICPC in Henan Province
2017

First Award

Underwater Robot Competition
2016

Third Award

RoboCup China Open
2016

Profile

I am an enthusiastic researcher bridging the realms of language and AI. With a strong penchant for knowledge mining, I excel in deriving structured insights from unstructured data. My expertise extends to DevOps for Large Language Models (LLM), showcasing my commitment to harnessing cutting-edge AI solutions.

Work Experience

Research Assistant

Darmstadt University of Technology      08.2021 - Present

BMBF-funded project InsightsNet, emphasizing academic publications knowledge mining.

  • Developed an annotation process pipeline for parsing these publications.
  • Established cross-disciplinary connections based on the extracted content.

Machine Learning Intern

Robert Bosch GmbH      04.2020 - 10.2020

Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.

  • Optimized data preprocessing, model tuning, stayed abreast of ML and time series trends.
  • Partnered with cross-functional teams for insights and model refinement.

Publications

Accelerating End-to-End PDF to Markdown Conversion through Assisted Generation

Changxu Duan, under review

Bridging scientific publication accessibility: LaTeX-markup-PDF-alignment

Changxu Duan, The Communications of the TeX Users Group (TUGboat) 45:2,pp. 179-184 2024.

LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework

Changxu Duan, Zhiyin Tan, Sabine Bartsch, In Proceedings of the second Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023.

Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora

Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch, In Proceedings of the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023.

The InsightsNet Climate Change Corpus (ICCC)

Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille, Journal of Datenbank-Spektrum, pp. 1610-1995. 2023.

Projects & Tools

Explaining the idea of language model generation through Jabber and entity linking

Poster presentation at Machine Learning Operations Summer School 2022

  • The generation of language models is often hallucinatory and imprecise.
  • Attempt to validate generation by controlling the language model to babble on the same problem, together with entity linking to knowledge graph to validate the output of the language model.
Semi-supervised Event-centered Emotion Analysis and Performance Prediction

Master Thesis at Robert Bosch GmbH

Investigated event-centered emotion analysis applicability in Semi-supervised Learning.

  • Designed a method using similarity to select unlabeled data, manage clustering attributes.
  • Identified key attributes of SSL Emotion Analysis tasks for performance prediction.
RainbowLaTeX Annotator

Transforms LaTeX codes into detailed PDF layouts, specifically for academic publications.

  • LaTeX Compilation: transformed LaTeX to PDF with dockerized TexLive2023.
  • LaTeX Annotation: applied color-labels to LaTeX elements for layout extraction.
  • Data Extraction: extracted text & figures from PDFs, identifying both types and positions.
SPARQL Auto-Completion (VS Code)

VS Code extension provides language support for SPARQL queries with two main functionalities.

  • Auto-complete prefix: When typing a prefix (e.g., `foaf:`), it automatically completes the full prefix declaration (`PREFIX foaf: `) in the file header.
  • IntelliSense for terms: Provides intelligent suggestions for classes and properties.