PageRange is not working #2545
-
Replace this: I was trying to split pdf's into chunks. I have 23 page pdf and I tried to split into 16 pages each. EnvironmentWhich environment were you using when you encountered the problem? Google Cloud $ python -m platform
Linux-5.10.0-28-cloud-amd64-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('cryptography', '41.0.5'), PIL=10.0.1 Code + PDFThis is a minimal, complete example that shows the issue: import base64
import pypdf #import PdfFileReader,PdfReader
from io import BytesIO
pdf_content = download_gcs_file_pdf(bucket_name, pdf_blob_name)
def split_pdf_to_pages(pdf_content, num_pages=16):
"""
Splits a PDF content into specified number of pages and returns base64-encoded strings for each page.
Args:
pdf_content (bytes): Binary content of the PDF file.
num_pages (int): Number of pages to split the PDF into (default is 16).
Returns:
list: List of base64-encoded strings representing each page.
"""
print("entered in func 1")
try:
pdf1_reader = pdf_content
total_pages = len(pdf1_reader.pages)
print("total_pages:{}".format(total_pages))
pages_per_chunk = total_pages // num_pages
base64_pages = []
for i in range(num_pages):
print("Page number is {}".format(i))
start_page = i * pages_per_chunk
end_page = (i + 1) * pages_per_chunk
pdf_chunk = pdf1_reader.PageRange(start_page, end_page)
#PyPDF2.pagerange.PageRange
print("entered in func 2")
base64_encoded = base64.b64encode(pdf_chunk).decode("utf-8")
base64_pages.append(base64_encoded)
return base64_pages
except Exception as e:
print(f"Error splitting PDF to pages: {e}")
return None
# Example usage
#print (download_gcs_file_pdf(bucket_name, pdf_blob_name))
pdf_file_path = download_gcs_file_pdf(bucket_name, pdf_blob_name)
#with open(pdf_file_path, "rb") as pdf_file:
# pdf_content = pdf_file.read()
bytes_stream = BytesIO(pdf_file_path)
# Read from bytes_stream
reader = PdfReader(bytes_stream)
base64_pages = split_pdf_to_pages(reader)
if base64_pages:
for i, page in enumerate(base64_pages):
print(f"Page {i+1} (base64-encoded):\n{page}")
else:
print("Failed to split PDF into pages.") Share here the PDF file(s) that cause the issue. The smaller they are, the Sorry - It's confidential file so can't share. TracebackThis is the complete traceback I see: entered in func 1
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Error : entered in func 1 |
Beta Was this translation helpful? Give feedback.
-
PyPDF2 is not maintained any more. Please migrate to pypdf. And |
Beta Was this translation helpful? Give feedback.
-
@stefan6419846 I get the same error when I changed to pypdf . Thanks for the quick response "Error splitting PDF to pages: 'PdfReader' object has no attribute 'PageRange'" |
Beta Was this translation helpful? Give feedback.
-
As already mentioned: |
Beta Was this translation helpful? Give feedback.
-
@dtm00777 |
Beta Was this translation helpful? Give feedback.
-
I do not know when or how your code has ever worked, but your Some possible solution might look like this: import base64
from pypdf import PageRange, PdfReader, PdfWriter
from io import BytesIO
def split_pdf(reader, num_pages=16):
try:
total_pages = len(reader.pages)
print("total_pages:", total_pages)
pages_per_chunk = total_pages // num_pages
for i in range(num_pages):
print("Page number is", i)
start_page = i * pages_per_chunk
end_page = (i + 1) * pages_per_chunk
writer = PdfWriter()
writer.append(reader, pages=PageRange(f"{start_page}:{end_page}"))
pdf_chunk = BytesIO()
writer.write(pdf_chunk)
base64_encoded = base64.b64encode(pdf_chunk.getvalue()).decode("utf-8")
yield base64_encoded
except Exception as e:
print(f"Error splitting PDF to pages: {e}")
pdf_file_path = "file.pdf"
with open(pdf_file_path, "rb") as pdf_file:
pdf_content = pdf_file.read()
bytes_stream = BytesIO(pdf_content)
reader = PdfReader(bytes_stream)
base64_pages = list(split_pdf(reader, num_pages=3))
if base64_pages:
for i, page in enumerate(base64_pages, start=1):
print(f"Page {i} (base64-encoded):\n{page}")
else:
print("Failed to split PDF into pages.") |
Beta Was this translation helpful? Give feedback.
I do not know when or how your code has ever worked, but your
PageRange
usage is wrong as well. You should really have a look at our docs before trying to implement something as you imagine it could work.Some possible solution might look like this: