Scanning

How to Compress Scanned PDFs (Best Method)

This guide explains how to clean noisy scans, run OCR, and compress the PDF so it behaves like a native digital document.

13 min readUpdated January 12, 2025

Scanned PDFs behave differently

Unlike digital PDFs that contain selectable text and vector art, a scanned PDF is essentially a stack of images. Each page might be a 300 DPI JPEG inside the PDF container, which explains why a 15-page packet can weigh 40 MB. Treating scans like ordinary documents will fail because compression algorithms need clues about what is text versus background noise.

The key is to remove imperfections before compression and to maintain OCR so the document remains searchable. The workflow below covers both goals.

Prep the scans before uploading

If you still have the raw scans, adjust contrast and crop edges inside your scanning app. Eliminating dark borders and shadow gradients lowers the amount of data that needs to be stored. Many mobile scanning apps include “document” filters that whiten backgrounds and sharpen lines—activate those before exporting the PDF.

When you receive a scanned PDF from someone else, duplicate it and run the duplicate through an image editor such as Preview or Photoshop to remove stray marks before compression.

Use OCR to reduce size

Optical Character Recognition not only makes text searchable but also allows compressors to treat words differently from images. Tools such as Adobe Acrobat, ABBYY FineReader, or free cloud OCR apps can analyze each page and create a hidden text layer. After OCR runs, the PDF often shrinks even before compression because redundant bitmap data can be stripped.

MyPDFHero respects existing OCR layers, so you can safely compress after running OCR elsewhere. The result is a readable, searchable PDF that is light enough for portals.

Compress with scan-aware settings

Upload the scanned PDF to MyPDFHero and let the tool detect that it contains bitmap pages. The compressor targets color noise, reduces gradients, and smooths out backgrounds while protecting stamps or signatures. It will never drop the DPI below legible thresholds for government or academic submissions.

If you still need to go smaller, run the PDF through the JPG to PDF tool after exporting images individually. This two-step approach rebuilds the PDF with even cleaner assets.

Archive high-res originals separately

Always keep an untouched copy of the scan in case you need to re-run OCR or prove authenticity. Store it in a secure cloud folder with restricted permissions. Use the compressed copy for day-to-day sharing so you do not clog inboxes.

Label the files clearly—e.g., “passport-scan-master.pdf” versus “passport-scan-email.pdf”—so colleagues know which version to grab.

Security best practices for scanned IDs

Scanned IDs and contracts often contain sensitive information. Compress them locally or via a zero-login tool like MyPDFHero to minimize exposure. When sending to third parties, password-protect the PDF or share through encrypted portals if policy requires it.

Always verify that the platform storing your originals meets compliance standards (SOC 2, ISO 27001, etc.).

Step-by-step workflow

Follow these practical steps inside MyPDFHero or your operating system to complete the task quickly.

  1. Step 1

    Clean the scan

    Crop edges, adjust contrast, and remove background textures before doing anything else.

  2. Step 2

    Run OCR

    Use Acrobat, Google Drive, or another OCR tool to add a text layer for searchability.

  3. Step 3

    Upload to MyPDFHero

    Drop the scanned PDF into the compressor and let it detect the best settings automatically.

  4. Step 4

    Check legibility

    Preview the output and zoom 200% to confirm stamps, seals, and signatures look sharp.

  5. Step 5

    Share securely

    Send the compressed PDF via email, secure portal, or encrypted storage, then delete temporary files.

Official resources

Validate your workflow with trusted documentation from Google, Microsoft, Adobe, and other official sources.

Frequently asked questions

Why are scanned PDFs so large?

Each page is a separate bitmap image, often captured at high DPI with color noise. This uses far more data than text-based PDFs.

Can I compress a scanned PDF without OCR?

Yes, but OCR improves quality and accessibility. Try to run OCR whenever possible so you can search the document later.

What DPI should I target for IDs or passports?

Keep critical documents at 300 DPI after compression. Anything lower might distort microtext or holographic seals.

Is it safe to use online tools with IDs?

Use tools that do not require login, transmit via HTTPS, and delete files within minutes. Always read their privacy statement.

How do I reduce background gray noise?

Run the scans through a “document” or “black-and-white” filter before exporting. This removes gradients that waste megabytes.

Can I batch process multiple scanned PDFs?

Process them sequentially to maintain control. For high volumes, automate OCR and pre-cleaning, then compress one file at a time.

Related reading

Expand your PDF toolkit with more long-tail guides from MyPDFHero.