Skip to content

Remove text from a PDF #4221

Jan 13, 2025 · 1 comments · 8 replies
Discussion options

You must be logged in to vote

The easiest way to remove all text is using "redaction annotations" (from all or selected pages):

doc = pymupdf.open("input.pdf")
page = doc[0]  # 0 or any 0-based page number
page.add_redact_annot(page.rect)  # redaction annotation covering the full page
page.apply_redactions(images=pymupdf.PDF_REDACT_IMAGE_NONE,  # keep the images
    graphics=pymupdf.PDF_REDACT_LINE_ART_NONE,  # keep vector graphics
    )

Specific text erasures work the same way, except you have to determine the desired boundary box to use instead of page.rect.

# extract text and full meta data exclusively (no images)
for block in page.get_text("dict", flags=pymupdf.TEXTFLAGS_TEXT)["blocks"]:
    for line in block["lines"

Replies: 1 comment 8 replies

Comment options

You must be logged in to vote
8 replies
@samuelbradshaw
Comment options

@samuelbradshaw
Comment options

@zergb
Comment options

@JorjMcKie
Comment options

@zergb
Comment options

Answer selected by samuelbradshaw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants