Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte #471

Open
sentry-io bot opened this issue Mar 7, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@sentry-io
Copy link

sentry-io bot commented Mar 7, 2024

Sentry Issue: AUTHOR-TOOLS-36K

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
(8 additional frame(s) were not displayed)
...
  File "at/utils/authentication.py", line 37, in require_api_key
    return f(*args, **kwargs)
  File "at/utils/file.py", line 178, in check_file
    return f(*args, **kwargs)
  File "at/api.py", line 48, in render
    dir_path, filename = process_file(
  File "at/utils/processor.py", line 44, in process_file
    filename = md2xml(filename, logger)
  File "at/utils/processor.py", line 54, in md2xml
    first_line = file.readline().strip()
@sentry-io sentry-io bot added the bug Something isn't working label Mar 7, 2024
@kesara
Copy link
Member

kesara commented Mar 7, 2024

Related to #469

@cabo
Copy link
Contributor

cabo commented Mar 7, 2024

Right, I get

SyntaxError: Unexpected token '<', "

still when I try the slightly broken document that started #469 (and that doesn't have anything suspicious in the first 6915 characters).

@cabo
Copy link
Contributor

cabo commented Mar 8, 2024

So there seems to be some processing in the authortools that tries to make use of the UTF-8-ness of the data before handing over the data to kramdown-rfc

@kesara
Copy link
Member

kesara commented Mar 10, 2024

Yes, Author Tools read the first line of a markdown file to identify which markdown tool to use.

@cabo
Copy link
Contributor

cabo commented Mar 10, 2024

I see. You could do this with the raw bytes instead of requiring the whole file to be proper UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants