Remove Metadata Before You Share Files: A Practical Guide
Photos, PDFs, and Office documents carry hidden metadata that can deanonymize you — GPS coordinates, author names, timestamps, device serials. Here's how to find and strip it with mat2, ExifTool, and Dangerzone, and where each tool falls short.
You can do everything else right — anonymous OS, Tor, a fresh pseudonym — and still get unmasked by a file you shared. The reason is metadata: data about data, the invisible information that programs quietly attach to the files you create. A photo can carry the exact GPS coordinates where it was taken. A PDF can carry your real name. An Office document can carry a revision history and the username of every machine that touched it.
This is one of the most common and most underestimated ways people deanonymize themselves. The good news is that it’s fixable with a few free, open-source tools and a habit. Let’s look at what’s hiding in your files, how to strip it, and — just as importantly — where the tools have limits you need to respect.
What Metadata Is Actually In Your Files
Different file types carry different baggage. The Tails project’s guide on the subject is blunt about the stakes: metadata “can deanonymize you or expose private information.” Here’s the short tour.
Images. Photos taken with a phone or camera embed EXIF metadata. That commonly includes the date and time, the camera or phone model and sometimes its serial number, the camera settings — and, if location services were on, the precise GPS coordinates of where the photo was taken. A single vacation snapshot can hand someone your home address.
PDFs. PDF files store an author field, the software that created them, creation and modification timestamps, and sometimes the title and the path the file was saved at. A “Final_v3” PDF can quietly announce who wrote it and on what machine.
Office documents. Word, Excel, PowerPoint and their OpenDocument equivalents are the worst offenders. They can carry the author name, the company, a full revision history, comments, tracked changes, and the usernames of everyone who edited the file. People have been identified by the author field of a document they thought was anonymous.
Everything else. Audio files carry ID3 tags. Many file formats have a “last modified by” or embedded thumbnail. Even an innocuous-looking format can leak more than you expect.
The threat here isn’t abstract. If your threat model involves staying unlinked from a pseudonym, metadata is exactly the kind of mundane, technical leak that defeats otherwise careful operational security.
Step 1: Look Before You Strip
Before removing anything, see what’s there. ExifTool — the de-facto standard tool for reading metadata, maintained by Phil Harvey — will dump everything it can find:
exiftool photo.jpg
That prints the full tag list: camera, timestamps, GPS, the works. Run it on a file you’re about to share and you’ll usually be surprised by how much is in there.
For a graphical, all-at-once look, mat2 (the Metadata Anonymisation Toolkit) has a show mode:
mat2 --show document.pdf
This lists the harmful metadata mat2 can detect without changing the file. Use it as a sanity check.
Step 2: Strip Metadata From Images and Most Files With mat2
For day-to-day metadata removal across a wide range of formats, mat2 is the right default tool. It’s open source, it’s what the Tails operating system ships (wrapped in a graphical app called Metadata Cleaner), and it supports a long list of formats: JPEG, PNG, GIF, TIFF, WebP, PDF, the Office and OpenDocument families, MP3/FLAC and other audio, MP4 and other video, SVG, EPUB, ZIP and tar archives, torrents, and more.
The basic usage is simple:
mat2 photo.jpg
Two behaviors are worth knowing. First, mat2 does not edit in place — it writes a new file with .cleaned inserted before the extension, so photo.jpg becomes photo.cleaned.jpg and your original is untouched. Second, by default mat2 may alter the data itself to scrub metadata as thoroughly as possible — for example, re-compressing an image or making PDF text non-selectable. If you need to guarantee the file data is left intact and will accept that some metadata may remain, use lightweight mode:
mat2 -L photo.jpg
That tradeoff — thorough cleaning versus byte-for-byte data preservation — is yours to make based on what matters more for the file in question.
Step 3: Strip Metadata With ExifTool (and the PDF Caveat)
ExifTool can also remove metadata, not just read it. The documented way to strip everything is:
exiftool -all= photo.jpg
For images this works well. For PDFs, there is a critical catch. ExifTool’s own documentation states that changes to PDF files are reversible, because “the original information is never actually deleted from the file.” ExifTool simply adds an update that hides the old metadata; the original is still recoverable. That means ExifTool alone is not a safe way to scrub a PDF you intend to share with an adversary in mind.
The standard fix is to follow up with qpdf, which can rewrite the file and drop the now-unreferenced original objects:
exiftool -all= file.pdf
qpdf --linearize file.pdf file.cleaned.pdf
Or — simpler and less error-prone for PDFs — just use mat2, which handles this properly.
Step 4: When the File Came From Someone Else — Dangerzone
The tools above are for files you created and want to clean before sharing. There’s a related but distinct risk: a file someone sent you that might be booby-trapped. For that, Dangerzone, an open-source tool from the Freedom of the Press Foundation, takes a different approach.
Dangerzone opens a potentially dangerous PDF, Office document, or image inside an offline sandbox, renders every page down to raw pixels, and then rebuilds a fresh, safe PDF from those pixels outside the sandbox. Because the output is reconstructed from pixel data, it carries none of the original file’s embedded code — and, as a side effect, none of the original document’s metadata survives the round trip either. Its primary purpose is defanging malware in untrusted documents, but the pixel-rebuild also produces a clean, metadata-free PDF.
It’s not a replacement for mat2 on your own files — re-rendering to pixels is lossy and overkill for routine cleaning — but it’s the right tool when you don’t trust the file’s origin.
The Limits You Must Respect
No metadata tool is magic, and overconfidence here is dangerous. Both mat2 and Tails state the same essential caveat plainly: there is no reliable way to detect and remove every possible piece of metadata in complex file formats. mat2’s documentation warns that a clean --show output does not prove a file is metadata-free. Tails notes that an Office document can contain embedded files that carry their own metadata which the cleaner can’t fully reach.
Practical consequences:
- Don’t trust a single pass blindly for high-stakes files in complex formats (especially Office documents and PDFs with embedded objects).
- Prefer simple formats when you can. A plain
.txtfile or a flattened image carries far less than a layered document. - For the highest-stakes sharing, recreate rather than clean. Retyping content into a fresh document, or screenshotting and re-rendering, leaves nothing of the original’s history to leak.
- Watch the obvious channels too. A filename like
report_for_jane_smith_draft2.pdfleaks identity no metadata tool will catch, and content visible in the file — a username in a screenshot, a reflection in a photo — is outside metadata entirely.
Metadata removal is one discipline inside a larger habit. The same care that strips an EXIF tag is the care that keeps you from reusing a handle or logging in to the wrong account. Treat it as a standard step before anything leaves your control, pair it with the rest of a minimal privacy stack, and — for genuinely high-stakes work — assume the tool missed something and reduce what you share to the simplest possible form.
Sources
Related
Threat Modeling for Journalists and Activists: A Practical Guide
A practical threat-modeling guide for journalists, activists, and anyone facing a capable adversary. Work through assets, adversaries, capabilities, and consequences — then build a proportional security plan.
The OPSEC Mistakes That Deanonymize People (and How to Avoid Them)
Most people aren't unmasked by broken encryption — they're unmasked by operational mistakes. Correlation, reused handles, locale and timezone leaks, writing style, and payment trails all defeat good tools. Here's how each one works and how to avoid it.
Signal, SimpleX, Session, and Matrix: Choosing a Private Channel by Threat Model
Encrypted messengers protect content, but they differ enormously in what metadata they leak and what identifier they tie you to. A threat-model-driven comparison of Signal, SimpleX, Session, and Matrix/Element — with each tool's real limits stated honestly.