Remove text from a PDF #4221
Answered
by
JorjMcKie
samuelbradshaw
asked this question in
Looking for help
-
Hi! Is there an efficient way to remove/delete all text from a PDF with PyMuPDF? |
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Jan 13, 2025
Replies: 1 comment 8 replies
-
The easiest way to remove all text is using "redaction annotations" (from all or selected pages): doc = pymupdf.open("input.pdf")
page = doc[0] # 0 or any 0-based page number
page.add_redact_annot(page.rect) # redaction annotation covering the full page
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE, # keep the images
graphics=pymupdf.PDF_REDACT_LINE_ART_NONE, # keep vector graphics
) Specific text erasures work the same way, except you have to determine the desired boundary box to use instead of # extract text and full meta data exclusively (no images)
for block in page.get_text("dict", flags=pymupdf.TEXTFLAGS_TEXT)["blocks"]:
for line in block["lines"]:
for span in line["spans"]:
if "unwanted font name" in span["font"]:
page.add_redact_annot(span["bbox"]) # cover text span with redact annot
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE, # keep the images
graphics=pymupdf.PDF_REDACT_LINE_ART_NONE, # keep vector graphics
) Important:
|
Beta Was this translation helpful? Give feedback.
8 replies
Answer selected by
samuelbradshaw
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The easiest way to remove all text is using "redaction annotations" (from all or selected pages):
Specific text erasures work the same way, except you have to determine the desired boundary box to use instead of
page.rect
.