Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

Open
akramhecini opened this issue Jan 20, 2025 · 0 comments

Comments

@akramhecini
Copy link

Hi there,

First of all, thank you for the fantastic work on python-docx!

I’ve run into a bit of an issue while using the library to generate reports. In my workflow, I create a document where each plot is added as a new page, and when the file size exceeds 5MB, I stream it to S3 for storage. However, when I reassemble the files from S3, the final document only includes the first page.

After some debugging, I discovered that the problem lies in the metadata added to new documents by python-docx. These metadata tags make each part of the document appear as a separate new document. By removing the metadata before uploading to S3, I was able to resolve the issue.

Currently, I have to rely on other Python libraries to strip out the metadata, which adds complexity to the workflow. It would be amazing if python-docx offered a built-in feature or function to handle this directly.

This would make the library even more user-friendly and save a lot of time for users with similar workflows.

Thank you for considering this, and I appreciate all the work that goes into maintaining the library!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant