AnonGuide
A camera and a laptop
guides

Remove Metadata Before You Share Files: A Practical Guide

Photos, PDFs, and Office documents carry hidden metadata that can deanonymize you — GPS coordinates, author names, timestamps, device serials. Here's how to find and strip it with mat2, ExifTool, and Dangerzone, and where each tool falls short.

By Editorial · · 8 min read

You can do everything else right — anonymous OS, Tor, a fresh pseudonym — and still get unmasked by a file you shared. The reason is metadata: data about data, the invisible information that programs quietly attach to the files you create. A photo can carry the exact GPS coordinates where it was taken. A PDF can carry your real name. An Office document can carry a revision history and the username of every machine that touched it.

This is one of the most common and most underestimated ways people deanonymize themselves. The good news is that it’s fixable with a few free, open-source tools and a habit. Let’s look at what’s hiding in your files, how to strip it, and — just as importantly — where the tools have limits you need to respect.

What Metadata Is Actually In Your Files

Different file types carry different baggage. The Tails project’s guide on the subject is blunt about the stakes: metadata “can deanonymize you or expose private information.” Here’s the short tour.

Images. Photos taken with a phone or camera embed EXIF metadata. That commonly includes the date and time, the camera or phone model and sometimes its serial number, the camera settings — and, if location services were on, the precise GPS coordinates of where the photo was taken. A single vacation snapshot can hand someone your home address.

PDFs. PDF files store an author field, the software that created them, creation and modification timestamps, and sometimes the title and the path the file was saved at. A “Final_v3” PDF can quietly announce who wrote it and on what machine.

Office documents. Word, Excel, PowerPoint and their OpenDocument equivalents are the worst offenders. They can carry the author name, the company, a full revision history, comments, tracked changes, and the usernames of everyone who edited the file. People have been identified by the author field of a document they thought was anonymous.

Everything else. Audio files carry ID3 tags. Many file formats have a “last modified by” or embedded thumbnail. Even an innocuous-looking format can leak more than you expect.

The threat here isn’t abstract. If your threat model involves staying unlinked from a pseudonym, metadata is exactly the kind of mundane, technical leak that defeats otherwise careful operational security.

Step 1: Look Before You Strip

Before removing anything, see what’s there. ExifTool — the de-facto standard tool for reading metadata, maintained by Phil Harvey — will dump everything it can find:

exiftool photo.jpg

That prints the full tag list: camera, timestamps, GPS, the works. Run it on a file you’re about to share and you’ll usually be surprised by how much is in there.

For a graphical, all-at-once look, mat2 (the Metadata Anonymisation Toolkit) has a show mode:

mat2 --show document.pdf

This lists the harmful metadata mat2 can detect without changing the file. Use it as a sanity check.

Step 2: Strip Metadata From Images and Most Files With mat2

For day-to-day metadata removal across a wide range of formats, mat2 is the right default tool. It’s open source, it’s what the Tails operating system ships (wrapped in a graphical app called Metadata Cleaner), and it supports a long list of formats: JPEG, PNG, GIF, TIFF, WebP, PDF, the Office and OpenDocument families, MP3/FLAC and other audio, MP4 and other video, SVG, EPUB, ZIP and tar archives, torrents, and more.

The basic usage is simple:

mat2 photo.jpg

Two behaviors are worth knowing. First, mat2 does not edit in place — it writes a new file with .cleaned inserted before the extension, so photo.jpg becomes photo.cleaned.jpg and your original is untouched. Second, by default mat2 may alter the data itself to scrub metadata as thoroughly as possible — for example, re-compressing an image or making PDF text non-selectable. If you need to guarantee the file data is left intact and will accept that some metadata may remain, use lightweight mode:

mat2 -L photo.jpg

That tradeoff — thorough cleaning versus byte-for-byte data preservation — is yours to make based on what matters more for the file in question.

Step 3: Strip Metadata With ExifTool (and the PDF Caveat)

ExifTool can also remove metadata, not just read it. The documented way to strip everything is:

exiftool -all= photo.jpg

For images this works well. For PDFs, there is a critical catch. ExifTool’s own documentation states that changes to PDF files are reversible, because “the original information is never actually deleted from the file.” ExifTool simply adds an update that hides the old metadata; the original is still recoverable. That means ExifTool alone is not a safe way to scrub a PDF you intend to share with an adversary in mind.

The standard fix is to follow up with qpdf, which can rewrite the file and drop the now-unreferenced original objects:

exiftool -all= file.pdf
qpdf --linearize file.pdf file.cleaned.pdf

Or — simpler and less error-prone for PDFs — just use mat2, which handles this properly.

Step 4: When the File Came From Someone Else — Dangerzone

The tools above are for files you created and want to clean before sharing. There’s a related but distinct risk: a file someone sent you that might be booby-trapped. For that, Dangerzone, an open-source tool from the Freedom of the Press Foundation, takes a different approach.

Dangerzone opens a potentially dangerous PDF, Office document, or image inside an offline sandbox, renders every page down to raw pixels, and then rebuilds a fresh, safe PDF from those pixels outside the sandbox. Because the output is reconstructed from pixel data, it carries none of the original file’s embedded code — and, as a side effect, none of the original document’s metadata survives the round trip either. Its primary purpose is defanging malware in untrusted documents, but the pixel-rebuild also produces a clean, metadata-free PDF.

It’s not a replacement for mat2 on your own files — re-rendering to pixels is lossy and overkill for routine cleaning — but it’s the right tool when you don’t trust the file’s origin.

The Limits You Must Respect

No metadata tool is magic, and overconfidence here is dangerous. Both mat2 and Tails state the same essential caveat plainly: there is no reliable way to detect and remove every possible piece of metadata in complex file formats. mat2’s documentation warns that a clean --show output does not prove a file is metadata-free. Tails notes that an Office document can contain embedded files that carry their own metadata which the cleaner can’t fully reach.

Practical consequences:

  • Don’t trust a single pass blindly for high-stakes files in complex formats (especially Office documents and PDFs with embedded objects).
  • Prefer simple formats when you can. A plain .txt file or a flattened image carries far less than a layered document.
  • For the highest-stakes sharing, recreate rather than clean. Retyping content into a fresh document, or screenshotting and re-rendering, leaves nothing of the original’s history to leak.
  • Watch the obvious channels too. A filename like report_for_jane_smith_draft2.pdf leaks identity no metadata tool will catch, and content visible in the file — a username in a screenshot, a reflection in a photo — is outside metadata entirely.

Metadata removal is one discipline inside a larger habit. The same care that strips an EXIF tag is the care that keeps you from reusing a handle or logging in to the wrong account. Treat it as a standard step before anything leaves your control, pair it with the rest of a minimal privacy stack, and — for genuinely high-stakes work — assume the tool missed something and reduce what you share to the simplest possible form.

Sources

  1. Tails — Removing metadata from files
  2. mat2 — Metadata Anonymisation Toolkit (official repository)
  3. ExifTool — official site (Phil Harvey)
  4. Dangerzone — Freedom of the Press Foundation

Related

Comments