Changxu Duan

NLP Researcher

Contact

location

Darmstadt, Hessen, Germany

email

changxu.duan@foxmail.com

Education

M.Sc. Computaional Linguistics

University of Stuttgart
2018 - 2021

B.Sc. Computer Sciences

Henan Normal University
2014 - 2018

Expertise

  • Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Large Language Models
  • Algorithms
  • DevOps
  • Data Annotation
  • Time Series Analysis

Skills

  • Python | Java | C | C++
  • Django | Flask | Streamlit
  • PyTorch | Huggingface
  • Spacy | NLTK | Stanza
  • Numpy | Scikit-learn
  • SQL | SPARQL
  • MySQL | ArangoDB
  • Git
  • Docker
  • Bash | PowerShell

Languages

  • Chinese Native
  • English C1
  • German B1

Awards

First Award

Lan Qiao Algorithmic Competition
2017

Bronze Medal

ACM-ICPC in Henan Province
2017

First Award

Underwater Robot Competition
2016

Third Award

RoboCup China Open
2016

Profile

I am an enthusiastic researcher bridging the realms of language and AI. With a strong penchant for knowledge mining, I excel in deriving structured insights from unstructured data. My expertise extends to DevOps for Large Language Models (LLM), showcasing my commitment to harnessing cutting-edge AI solutions.

Work Experience

Research Assistant

Darmstadt University of Technology      08.2021 - Present

BMBF-funded project InsightsNet, emphasizing academic publications knowledge mining.

  • Developed an annotation process pipeline for parsing these publications.
  • Established cross-disciplinary connections based on the extracted content.

Machine Learning Intern

Robert Bosch GmbH      04.2020 - 10.2020

Crafted time series models with RNNs, GANs, and VAEs for bicycle sensor predictions.

  • Optimized data preprocessing, model tuning, stayed abreast of ML and time series trends.
  • Partnered with cross-functional teams for insights and model refinement.

Publications

LATEX Rainbow: Universal LATEX to PDF Document Semantic & Layout Annotation Framework

Changxu Duan, Zhiyin Tan, Sabine Bartsch, In Proceedings of the second Workshop on Information Extraction from Scientific Publications at IJCNLP-AACL 2023.

Presenting an Annotation Pipeline for Fine-grained Linguistic Analyses of Multimodal Corpora

Elena Volkanovska, Sherry Tan, Changxu Duan, Debajyoti Paul Chowdhury, Sabine Bartsch, In Proceedings of the first Workshop on Linguistic Insights from and for Multimodal Language Processing at KONVENS 2023.

The InsightsNet Climate Change Corpus (ICCC)

Sabine Bartsch, Changxu Duan, Sherry Tan, Elena Volkanovska, Wolfgang Stille, Journal of Datenbank-Spektrum, pp. 1610-1995. 2023.

Projects & Tools

Explaining the idea of language model generation through Jabber and entity linking

Poster presentation at Machine Learning Operations Summer School 2022

  • The generation of language models is often hallucinatory and imprecise.
  • Attempt to validate generation by controlling the language model to babble on the same problem, together with entity linking to knowledge graph to validate the output of the language model.
Semi-supervised Event-centered Emotion Analysis and Performance Prediction

Master Thesis at Robert Bosch GmbH

Investigated event-centered emotion analysis applicability in Semi-supervised Learning.

  • Designed a method using similarity to select unlabeled data, manage clustering attributes.
  • Identified key attributes of SSL Emotion Analysis tasks for performance prediction.
SPARQL Autocompletion

A language server tailored for enhancing SPARQL query functionalities.

  • Syntax Highlighting: improved the syntax visualization of VSCode SPARQL extension.
  • Advanced Features: auto-completion for prefixes & IntelliSense for classes and properties.
CRNN Speech Emotion Detector

A Neural Network to predict emotion from MFCC features of time-series speech data.

  • Based on VAD (Valence-Arousal-Dominance) model, which is a regression task.
  • Model consists of a time distributional 1D CNN and GRU in Kera.
  • The accuracy of the model is about 71%.
Wiener filter for microphone arrays

Speech enhancement and noise reduction on a 6-microphone array system in MATLAB.

  • Determine the speaking direction by the sound delay recorded by the microphone array.
  • Filter background noise and enhance speech in the direction of speaking with Wiener filters.