You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
There is a discrepancy between the CLI and Python when using the download_dir parameter in unstructured-ingest when running Box -> Azure Cognitive Search. The CLI correctly downloads files to the specified directory, while the Python implementation attempts to write files to the root directory, resulting in a "Read-only file system" error.
Error message: "unstructured.ingest.error.SourceConnectionError: Error in getting data from upstream data source: [Errno 30] Read-only file system: '/{here is the folder as in the box itself}'"
Expected behavior
The Python implementation should respect the download_dir parameter in the ReadConfig and download files to the specified directory, just like the CLI does.
Environment Info
unstructured: 0.14.9 (issue also present in version 0.12.x)
The text was updated successfully, but these errors were encountered:
Describe the bug
There is a discrepancy between the CLI and Python when using the download_dir parameter in unstructured-ingest when running Box -> Azure Cognitive Search. The CLI correctly downloads files to the specified directory, while the Python implementation attempts to write files to the root directory, resulting in a "Read-only file system" error.
To Reproduce
CLI (working):
unstructured-ingest box \ --box-app-config box_config_test.json \ --remote-url box://12345 \ --work-dir ./unstructured/ \ --output-dir ./unstructured/ \ --download-dir ./unstructured/ \ --num-processes 1 \ --raise-on-error \ --verbose \ --recursive \ --re-download
Python Runner (throw an error):
runner = BoxRunner( processor_config=ProcessorConfig( work_dir="./unstructured/", verbose=True, raise_on_error=True, output_dir="./unstructured/", num_processes=1, ), read_config=ReadConfig( download_dir="./unstructured/", re_download=True, ), partition_config=PartitionConfig(), connector_config=SimpleBoxConfig( remote_url="box://12345", recursive=True, access_config=BoxAccessConfig( box_app_config="./box_config_test.json"), ), ) runner.run()
Error message: "unstructured.ingest.error.SourceConnectionError: Error in getting data from upstream data source: [Errno 30] Read-only file system: '/{here is the folder as in the box itself}'"
Expected behavior
The Python implementation should respect the download_dir parameter in the ReadConfig and download files to the specified directory, just like the CLI does.
Environment Info
unstructured: 0.14.9 (issue also present in version 0.12.x)
The text was updated successfully, but these errors were encountered: