Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

akramhecini · 2025-01-20T15:55:16Z

Hi there,

First of all, thank you for the fantastic work on python-docx!

I’ve run into a bit of an issue while using the library to generate reports. In my workflow, I create a document where each plot is added as a new page, and when the file size exceeds 5MB, I stream it to S3 for storage. However, when I reassemble the files from S3, the final document only includes the first page.

After some debugging, I discovered that the problem lies in the metadata added to new documents by python-docx. These metadata tags make each part of the document appear as a separate new document. By removing the metadata before uploading to S3, I was able to resolve the issue.

Currently, I have to rely on other Python libraries to strip out the metadata, which adds complexity to the workflow. It would be amazing if python-docx offered a built-in feature or function to handle this directly.

This would make the library even more user-friendly and save a lot of time for users with similar workflows.

Thank you for considering this, and I appreciate all the work that goes into maintaining the library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

akramhecini commented Jan 20, 2025

Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

Feature Request: Option to Remove Metadata from New Documents for Streaming Workflows #1464

Comments

akramhecini commented Jan 20, 2025