New tool turns presentation videos into searchable PDF summaries

PV2DOC organizes both audio and visual data from presentation videos into structured PDF documents, making the content easier to understand and access. Credit: Associate Professor Hyuk-Yoon Kwon from Seoul National University of Science and Technology.

Watching presentation videos can be a great way to learn, but they often take up a lot of time and storage space.

Finding specific information means watching the entire video or skipping around, which can be frustrating.

To solve these problems, researchers from Seoul National University of Science and Technology, led by Professor Hyuk-Yoon Kwon, have created a tool called PV2DOC.

This innovative software converts presentation-style videos into summarized, searchable PDF documents.

The research was published in the journal SoftwareX on December 1, 2024.

PV2DOC stands out because it uses both the video’s visual and audio data to create clear, organized summaries.

This means the tool can work even when only the video is available, unlike other tools that require separate transcripts.

  1. Visual Data Processing:
    PV2DOC analyzes video frames, extracting one frame per second. It uses a technique called the structural similarity index to identify important frames, like slides with figures, tables, or graphs. Advanced object detection models, Mask R-CNN and YOLOv5, identify objects in the frames. If some images are fragmented (e.g., split into smaller sections), PV2DOC combines them into a single figure using a merging technique. The software then applies optical character recognition (OCR) with Google Tesseract to extract text from these images.
  2. Audio Data Processing:
    The tool extracts audio from the video and uses Whisper, an open-source speech-to-text tool, to turn the spoken words into text. This transcription is then summarized using the TextRank algorithm, which highlights the main points of the presentation.
  3. Creating the Document:
    All the extracted text, figures, and data are organized into a Markdown document, which is then converted into a PDF. The final document mirrors the structure of the original video, with clear headings, summaries, and linked images or figures.

PV2DOC creates summaries that users can read in just a few minutes, saving time for people studying lecture videos or conference presentations. The structured PDF documents are easy to search and share, requiring much less storage space than video files.

“This software simplifies data storage and makes video content more accessible. It turns unstructured data into a structured format that’s easier to analyze,” said Prof. Kwon.

Looking ahead, the team plans to improve PV2DOC by training a large language model (LLM), similar to ChatGPT. This enhancement will allow users to ask questions about the video’s content and receive accurate answers, making it even easier to access specific information.

PV2DOC represents a big step toward making video-based information more efficient and user-friendly.