Guides/PDF Compression

Understanding PDF Compression: A Technical Deep Dive

Learn how PDF compression algorithms work, why file sizes vary dramatically, and how to achieve the optimal balance between size and quality.

18 min readTechnical GuideUpdated Feb 2026

You have a 15 MB PDF that needs to be under 1 MB for a government portal. You run it through a compression tool and get a 800 KB file. But what actually happened? How did the file shrink by 95%? And more importantly, what did you lose in the process?

In this guide, we will explore the technical foundations of PDF compression—not to turn you into a computer scientist, but to give you the knowledge to make informed decisions when compressing your documents.

What Is Actually Inside a PDF?

Before we discuss compression, we need to understand what makes up a PDF file. A PDF is essentially a container that holds several types of content:

Text Content

The actual characters, along with information about their position on the page. Typically a small portion of file size.

Images

Photos, graphics, logos. Often the largest contributors to file size.

Fonts

Embedded font files that ensure text displays correctly on any device.

Metadata & Structure

Document information, bookmarks, links, form fields, and internal structure.

A typical document PDF might be 60% images, 25% fonts, 10% text, and 5% metadata. A scanned document, however, might be 99% images. Understanding your PDF's composition helps you understand what compression can achieve.

Compression Fundamentals: Lossy vs. Lossless

Lossless Compression

Lossless compression reduces file size without losing any information. The original data can be perfectly reconstructed. Think of it like vacuum-packing clothes—the clothes take up less space, but when you unpack them, they are exactly the same.

Common lossless algorithms include:

  • ZIP/Deflate: Used for text and metadata in PDFs
  • LZW: Another general-purpose compression method
  • PNG: For images where exact reproduction is needed

The limitation of lossless compression is that it has a ceiling. You cannot compress random data, and structured data has a theoretical minimum size based on its information content (entropy).

Lossy Compression

Lossy compression achieves smaller sizes by permanently discarding some data. The discarded data is chosen to be "less important"—information that humans are less likely to notice is missing.

For images, this typically means:

  • Reducing color precision (e.g., from millions of colors to thousands)
  • Averaging nearby pixels together
  • Removing high-frequency details (fine textures, subtle gradients)

JPEG is the most common lossy image format. When you see "JPEG quality 80%" versus "JPEG quality 50%", the lower number means more data is discarded, resulting in smaller files but potentially visible artifacts.

Image Compression in PDFs

Since images are typically the largest component of PDFs, image compression is crucial. Here is how it works:

Resolution Reduction (Downsampling)

A 3000x4000 pixel image contains 12 million pixels. If that image only needs to display at 300x400 in the final document, we are wasting storage on 11.88 million unnecessary pixels.

Downsampling reduces the number of pixels, which dramatically reduces file size. The key is choosing the right target resolution:

  • 300 DPI: Professional print quality
  • 150 DPI: Good quality for standard printing
  • 72-96 DPI: Screen viewing only

Example Calculation

A 3000x4000 pixel photo at 24-bit color = ~36 MB uncompressed
Downsampled to 600x800 pixels = ~1.4 MB uncompressed
JPEG compressed at 80% quality = ~150 KB
Total reduction: 99.6%

Color Space Optimization

Images can use different color models:

  • CMYK (4 channels): For professional printing, 33% larger than RGB
  • RGB (3 channels): Standard for screens and most uses
  • Grayscale (1 channel): 66% smaller than RGB for black-and-white images

Converting a color image to grayscale (when appropriate) is one of the most effective compression techniques.

Font Optimization: Subsetting and Embedding

Fonts can significantly impact PDF size. A full font file might contain thousands of glyphs (characters, symbols, etc.) for multiple languages, but your document might only use 50 unique characters.

Font Subsetting

Font subsetting creates a custom font file containing only the characters actually used in the document. If your document uses only English letters, numbers, and basic punctuation, the subset font might be 95% smaller than the full font.

Font Embedding Options

  • Full embedding: Largest size, maximum compatibility
  • Subset embedding: Smaller size, cannot edit to add new characters
  • No embedding: Smallest size, but text may display incorrectly if the reader does not have the font installed

The Quality vs. Size Tradeoff

There is no "free lunch" in compression. Every reduction in file size comes with a cost—either computational (processing time) or quality. Here is a general guide:

Compression LevelTypical ReductionQuality ImpactBest For
Light20-40%ImperceptibleArchiving, printing
Medium50-70%Minor artifacts on close inspectionEmail, general use
High70-90%Visible quality lossWeb upload, size-limited portals
Maximum90-98%Significant quality lossStrict size requirements

Practical Compression Tips

  1. Know your target: Understand the exact requirements (file size, dimensions, format) before compressing.
  2. Start with the original: Always compress from the highest-quality source. Compressing an already-compressed file yields worse results.
  3. Preview before saving: Check compressed output for acceptable quality before deleting originals.
  4. Consider the content: Text-heavy documents compress better than image-heavy ones.
  5. Use appropriate tools: Different tools have different algorithms and quality/size tradeoffs.

Compress Your PDFs Intelligently

Our PDF compression tools use smart algorithms to achieve the best quality-to-size ratio. All processing happens in your browser—your files never leave your device.

Frequently Asked Questions

What is the difference between lossy and lossless compression?

Lossless compression reduces file size without losing any data—the original can be perfectly reconstructed. Lossy compression discards some data permanently to achieve smaller sizes. For PDFs, text uses lossless compression while images often use lossy compression.

Why does my PDF become blurry after compression?

This happens when images in the PDF are compressed too aggressively using lossy methods. The compression algorithm discards image data to reduce file size, resulting in visible quality loss. Using a higher quality setting or less aggressive compression will help.

Can I compress a PDF multiple times?

Technically yes, but each round of lossy compression degrades quality further. It is better to start from the original PDF and compress once to your target size. If you only have an already-compressed PDF, further compression will yield diminishing returns with more quality loss.

Why is my scanned PDF so large?

Scanned PDFs contain images of each page rather than actual text. These images are typically large and uncompressed. Compression can significantly reduce scanned PDF sizes, but the quality depends on the original scan resolution.

What is the smallest I can compress a PDF without losing quality?

This depends on the PDF content. Text-only PDFs can be compressed significantly with no quality loss. PDFs with images have a lower limit—typically, reducing an image below 72-100 DPI will cause noticeable quality loss when printed.