I just completed “Historian Hysteria” - Day 1 - Advent of Code 2024 #AdventOfCode Advent of Code 2024 Day 1
Removing Author Name From PDF Annotations
Annotating PDFs with highlights and text notes can be an effective way to provide feedback to others reading the document or those that authored it. However, some situations call for providing that feedback anonymously. This can easily be done by ensuring the author name is blank in your PDF application of choice, but removing the author name from existing annotations can prove difficult.
I recently needed to remove my name from a number of annotations in a PDF and struggled to find a quick and effective method for doing so. Adobe Acrobat Reader was the only option that appeared to let me edit the author section and remove it from each note, but my name was still visible when I opened the file in a different program to confirm the Adobe edits worked.
After spending a bit of time searching for an alternative approach, I went down the path of writing a script to do this for me (here be dragons). I was surprised to learn that many parts of a PDF are structured as dictionaries, which makes them very easy to read and modify using python. The script below uses the pypdf library to read in our PDF file and write out a modified version.
The script iterates through each page of the document looking for a dictionary key labeled ‘/Annots’
. It then iterates through each annotation object looking for any that contain a key labeled '/T'
. This key stores the author name of each annotation, if it exists. We then remove that key from the dictionary with the pop()
method. Finally, a new writer
object is created from the modified pages and written to disk.
The input and output file paths are hard-coded in this example. I may revisit this in the future to allow it to run from the command line with a file path given as input, but that will have to be work for another day.
from pypdf import PdfReader, PdfWriter
reader = PdfReader('input.pdf')
writer = PdfWriter()
for page in reader.pages:
if '/Annots' in page:
for annot in page['/Annots']:
obj = annot.get_object()
if '/T' in obj:
obj.pop('/T')
writer.append_pages_from_reader(reader)
with open('output.pdf', 'wb') as fp:
writer.write(fp)