Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

ICDAR 2025 ∙ Long Paper ∙ X

Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown

Author(s): Changxu Duan

[Code] [PDF (Coming Soon)]

Overview

Academic PDFs are difficult to convert into structured formats due to their complex layouts: figures, tables, equations, densely packed text, etc. Existing vision-language models (e.g. Nougat, OlmOCR) typically regenerate the entire content from scratch, even when much of the text could simply be reused.

In this work, we present EditTrans, a layout-aware editing framework that significantly accelerates the PDF-to-Markdown conversion process. By intelligently distinguishing between reusable and generative content, EditTrans reduces redundant computation and directs vision-language models to only generate what’s necessary.

Empirically, EditTrans achieves up to 44% reduction in transformation latency and saves over 43% of decoding steps across datasets like arXiv and Quantum Physics, without compromising output quality. In most cases, it even slightly improves fidelity, as measured by edit distance, F1 score, and translation metrics. These gains make EditTrans a practical and scalable solution that enabling efficient and accessible scholarly content transformation at scale.

Overview of EditTrans: A layout-aware editing framework for efficient PDF-to-Markdown conversion.

How it works:

Document Analysis: Starting with an academic PDF, the system identifies and separates reusable plain text from complex content such as figures, tables, and mathematical formulas.
Placeholder Insertion: The reusable text is preserved as-is, while placeholders ([MARK]) are inserted in locations where transformation is needed.
Context-Aware Generation: Using both the original document (converted to PNG) and the placeholder-annotated text, a vision-language model (VLM) fills in the [MARK] regions with appropriately formatted content, such as LaTeX for formulas or HTML for tables.
Markdown Output: The final output is a clean, well-structured Markdown document that combines directly reused content with high-fidelity generated elements, optimized for both readability and computational efficiency.

How to Cite

@inproceedings{duan-2025-editrans,
    author    = {Changxu Duan},
    title     = {Layout-Aware Text Editing for Efficient Transformation of Academic PDFs to Markdown},
    booktitle = {Proceedings of the 19th International Conference on Document Analysis and Recognition (ICDAR)},
    year      = {2025},
    month     = {September},
    address   = {Wuhan, China}
}