ICDAR 2025 ∙ Long Paper ∙ X

Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

Author(s): Changxu Duan

[Code] [PDF (Coming Soon)]

Overview

Academic PDFs are difficult to convert into structured formats due to their complex layouts: figures, tables, equations, densely packed text, etc. Existing vision-language models (e.g. Nougat, OlmOCR) typically regenerate the entire content from scratch, even when much of the text could simply be reused.

In this work, we present EditTrans, a layout-aware editing framework that significantly accelerates the PDF-to-Markdown conversion process. By intelligently distinguishing between reusable and generative content, EditTrans reduces redundant computation and directs vision-language models to only generate what’s necessary.

Empirically, EditTrans achieves up to 44% reduction in transformation latency and saves over 43% of decoding steps across datasets like arXiv and Quantum Physics, without compromising output quality. In most cases, it even slightly improves fidelity, as measured by edit distance, F1 score, and translation metrics. These gains make EditTrans a practical and scalable solution that enabling efficient and accessible scholarly content transformation at scale.

Overview of EditTrans

Overview of EditTrans: A layout-aware editing framework for efficient PDF-to-Markdown conversion.

How it works:

How to Cite

@inproceedings{duan-2025-editrans,
    author    = {Changxu Duan},
    title     = {Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown},
    booktitle = {Proceedings of the 19th International Conference on Document Analysis and Recognition (ICDAR)},
    year      = {2025},
    month     = {September},
    address   = {Wuhan, China}
}