Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hebrew pdf documents and web urls gives latin-1 error #1333

Open
freshuk opened this issue Sep 22, 2024 · 6 comments
Open

hebrew pdf documents and web urls gives latin-1 error #1333

freshuk opened this issue Sep 22, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@freshuk
Copy link

freshuk commented Sep 22, 2024

Describe the bug

when uploading pdf documents and web urls in the ingest documents screen, i am getting an error which most of the times looks like this:
Traceback (most recent call last): File "/usr/local/src/myscripts/admin/pages/01_Ingest_Data.py", line 95, in <module> st.session_state["file_url"] = blob_client.upload_file( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/src/myscripts/admin/batch/utilities/helpers/azure_blob_storage_client.py", line 119, in upload_file blob_client.upload_blob( File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_blob_client.py", line 775, in upload_blob return upload_block_blob(**options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_upload_helpers.py", line 102, in upload_block_blob response = client.upload( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 846, in upload pipeline_response: PipelineResponse = self._client._pipeline.run( # pylint: disable=protected-access ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 229, in run return first_node.send(pipeline_request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 2 more times] File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/policies/_redirect.py", line 197, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/policies.py", line 529, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 1 more time] File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/policies.py", line 302, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 118, in send self._sender.send(request.http_request, **request.context.options), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 348, in send return self._transport.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 355, in send response = self.session.request( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/opentelemetry/instrumentation/requests/init.py", line 180, in instrumented_send return wrapped_send(self, request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/opentelemetry/instrumentation/urllib3/init.py", line 316, in instrumented_urlopen return wrapped(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 789, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 495, in _make_request conn.request( File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 397, in request self.putheader(header, value) File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 311, in putheader super().putheader(header, *values) File "/usr/local/lib/python3.11/http/client.py", line 1267, in putheader values[i] = one_value.encode('latin-1') ^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-4: ordinal not in range(256)

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to 'admin url'
  2. Click on 'ingest documents'
  3. Upload 'any hebrew pdf document or input any hebrew url'
  4. See error

Screenshots

https://snipboard.io/w9y8gB.jpg

@freshuk freshuk added the bug Something isn't working label Sep 22, 2024
@Govardhana-Microsoft
Copy link

@freshuk We are able to reproduce the issue. seems issue with the file name. our team looking into this.
image
@Roopan-Microsoft

@freshuk
Copy link
Author

freshuk commented Sep 23, 2024 via email

@Prasanjeet-Microsoft
Copy link
Contributor

@freshuk We are currently addressing this issue and will keep you updated.

@Prasanjeet-Microsoft
Copy link
Contributor

Prasanjeet-Microsoft commented Sep 27, 2024

@freshuk Can you please provide us the URL's for which you are getting errors while uploading in ingest documents screen?

@Prasanjeet-Microsoft
Copy link
Contributor

Prasanjeet-Microsoft commented Sep 30, 2024

Hello @freshuk, I’d like to inform you that we are unable to reproduce the issue with the URL. Below is the evidence for your reference:

Recording.2024-09-30.210113.mp4
Recording.2024-09-30.211057.mp4

Could you please share how you are attempting to process the web pages/URLs?

@Prasanjeet-Microsoft
Copy link
Contributor

@freshuk Could you please provide the specific URL's where you are encountering errors while uploading/processing in the ingest documents screen? This information will help us troubleshoot the issue more effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants