Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebP images in a gallery cause an unhandled exception during build #3671

Closed
6 of 9 tasks
tartley opened this issue Mar 23, 2023 · 5 comments
Closed
6 of 9 tasks

WebP images in a gallery cause an unhandled exception during build #3671

tartley opened this issue Mar 23, 2023 · 5 comments
Labels

Comments

@tartley
Copy link
Contributor

tartley commented Mar 23, 2023

Environment

Python Version:: Python3.10.6 (and 3.11.0rc1 behaves the same)
Nikola Version: v8.2.3 (and latest git clone of commit 483ffd8 Wed Mar 1 22:52:02 2023 behaves the same)
Operating System: Linux Pop_OS 22.04 LTS (derived from Ubuntu 22.04)

Description:

Hi, good people! I am happy to look into contributing a PR to fix this myself, but wanted to discuss it first before wading in. Thanks!

Symptom

Adding a WebP image (with '.webp' extension) to my gallery causes nikola build to fail partway through, with a traceback (below). This happens even on a fresh site, created using nikola init -dq site.

After the failure, running nikola serve displays the gallery correctly, including all the WebP images, but the gallery RSS feed is empty. Because this is so close to working, I'm hopeful that a small fix might be possible.

I notice that galleries with an empty file named "image.webp" provoke this error, while gallery files named "image", with no extension, or "image.nonsense", are correctly ignored. So I infer that webp images are expected to work in galleries. (and the section below, "Any other image types we should handle?", seems to corroborate.)

The expected behavior is:

  • WebP images ought to "just work", both at site generation time, and at gallery viewing time, along with their gallery RSS feed.
  • For any gallery image file with an unrecognized mime type, options for how we should behave are enumerated below in section "Be robust to files with unrecognized mime type".

The Traceback

$ nikola build
Scanning posts........done!
.  render_galleries:output/galleries/demo/sp.webp
.  render_galleries:output/galleries/demo/index.html
.  render_galleries:output/galleries/demo/rss.xml
TaskError - taskid:render_galleries:output/galleries/demo/rss.xml
PythonAction Error
Traceback (most recent call last):
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/doit/action.py", line 461, in execute
    returned_value = self.py_callable(*self.args, **kwargs)
  File "/home/jhartley/Dropbox/jon/data/code/3rdparty/nikola/nikola/nikola/plugins/task/galleries.py", line 773, in gallery_rss
    data = rss_obj.to_xml(encoding='utf-8')
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 39, in to_xml
    self.write_xml(f, encoding)
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 34, in write_xml
    self.publish(handler)
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 380, in publish
    item.publish(handler)
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 440, in publish
    self.enclosure.publish(handler)
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 221, in publish
    _element(handler, "enclosure", None,
  File "/home/jhartley/.virtualenvs/nikola/lib/python3.10/site-packages/PyRSS2Gen.py", line 47, in _element
    handler.startElement(name, d)
  File "/usr/lib/python3.10/xml/sax/saxutils.py", line 170, in startElement
    self._write(' %s=%s' % (name, quoteattr(value)))
  File "/usr/lib/python3.10/xml/sax/saxutils.py", line 60, in quoteattr
    data = escape(data, entities)
  File "/usr/lib/python3.10/xml/sax/saxutils.py", line 27, in escape
    data = data.replace("&", "&")
AttributeError: 'NoneType' object has no attribute 'replace'

The problem stems from slightly earlier, nikola/plugins/task/galleries.py, line 753:

mimetypes.guess_type(img)[0]

Where 'img' is the filename of the WebP image, ending with extension '.webp'. .guess_type() doesn't recognize the MIME type of this, and returns (None, None). The first of those None values causes problems a little later in the code (see the traceback), where it is used in XML/RSS generation.

Analysis

The docs for the standard library function .guess_type() say that it recognizes the MIME types that are registered with IANA. They list 'image/webp' in a draft RFC, so this isn't yet an official MIME type.

Sure enough, 'webp' is not included in the files consulted by the Python mimetypes package, which can be listed using:

>>> mimetypes.knownfiles
['/etc/mime.types', '/etc/httpd/mime.types', '/etc/httpd/conf/mime.types', '/etc/apache/mime.types', '/etc/apache2/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/lib/netscape/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/etc/mime.types']

So it's not surprising that guess_type() returns None.

Possible Solutions

I know nothing, so these might be bad ideas. Each checklist item is described below.

  • Inject a hardcoded 'image/web' mimetype at runtime, before .guess_type() gets called (Merged as #3674)
  • Add a test of gallery RSS content (Merged as #3676)
  • Be robust to files with unrecognized mime types
  • Check other image types we should handle. (Done in #3672.)
  • Review other calls to mimetype.guess_type().
  • Review whether "producing an RSS enclosure" is duplicated code.
  • Call mimetypes.init() (Rejected)
  • Consider manually adding the webp mime type to my machine (Rejected)
  • Consider automatically adding the webp mime type on Nikola install (Rejected)

Inject a hardcoded 'image/web' mimetype at runtime, before .guess_type() gets called

It sounds like the mimetypes module has ways to inject custom mimetypes into its internal list. We could add ".webp" -> "image/webp" before .guess_type() gets called.

Add a test of gallery RSS content

A new test of RSS content would be useful, currently I think this is only verified in the CI baseline tests. Once such a test exists in the Python test suite, it (or a minor variation on it) could be used to verify the proposed fallback for unknown mimetypes (see the next item below).

Be robust to files with unrecognized mime type

Regardless of whether we decide to support WebP images here or not, we should be robust to .guess_type() returning (None, None) for files with unrecognized mimetype. There must be a more graceful way to handle this than ending the build with an unhandled exception. But what should that be?

  • Fall back to a more generic "image" mimetype? Does such a thing exist? Or even more generally, an octet stream? This does exist, but presumably would not display in a browser. What would the practical upshot be for gallery viewers?
  • Display a placeholder broken image instead? What would be the best way to do this?
  • Don't include this image in the gallery at all?
  • Log the error so the site owner can diagnose what's going on?

Check other image types we should handle.

Are there other images types that might work in a gallery but are impacted by this problem? If I'm going to look at this, I might as well systematically test them.

Although that did uncover some other problems, tracked there, none of the other image types break the build like this. They all have a recognized mimetype.

Review other calls to mimetype.guess_type()

The code contains three calls to mimetype.guess_type():

  1. nikola/plugins/task/galleries.py, line 753: The code under discussion, for producing RSS enclosures for images in a gallery.
  2. nikola/nikola.py, line 374: Produces RSS enclosures for posts.
  3. nikola/plugins/command/auto/__init__.py, line 574: This code is doing something else, not related to RSS as far as I can see.

Do calls (2) and (3) also need guarding against (None, None) return values?

Review whether "producing an RSS enclosure" is duplicated code.

Do calls (1) and (2) above represent duplicated code, to produce an RSS enclosure, that should be extracted and shared in a function somehow? On first glance I don't think so, but plan to review the idea and would be interested to hear any opinions.

Rejected ideas

Rejected idea: Call mimetypes.init()

My first glance at the mimetypes docs pages made me think we ought to be calling mimetypes.init() before calling .guess_type(). But later I noticed the docs says:

If the module has not been initialized, [functions] will call init() if they rely on the information init() sets up.

So we don't need to call .init() explicitly.

Rejected idea: Consider manually adding the webp mime type to my machine

I see StackOverflow questions about not-entirely-dissimilar problems in other software, which are also caused by the lack of a known MIME type for WebP images. A suggested solution there is to add the 'image/webp' MIME type to one of the 'knownfiles' listed above.

This sounds like a temporary workaround for users who hit this today. But editing these files presumably conflicts with the distributions management of them, which will have bad side effects (e.g. updates could overwrite the user's changes.)

Rejected idea: Consider automatically adding the webp mime type on Nikola install

Instead of suggesting to users that they change the contents of the 'knownfiles' listed above, we could do it automatically for them on Nikola install. This sounds like a bad idea which could cause problems for us in the future (e.g. when our changes are overwritten by an automated update of the changed file, leaving the user in a broken state again.)

Also, this would probably require admin/root privs, which most Nikola installs (hopefully) do not have.

Thanks! ❤️

Huge thanks for all the amazing Nikola goodness! I've been using and loving it for years!

@Kwpolska
Copy link
Member

Thanks for this very detailed report!

I think we should do both of these:

  • Inject a hardcoded 'image/web[p]' mimetype at runtime, before .guess_type() gets called
  • Be robust to files with unrecognized mime type

As for this:

  • Review whether "producing an RSS enclosure" is duplicated code.

I think the best course of action would be to have a helper function that (a) ensures we have registered webp with the library, and (b) replaces None with application/octet-stream. since I suppose that would be enough.

Would you be willing to send a pull request with a patch to resolve this? (It can focus only on webp for now).

@tartley
Copy link
Contributor Author

tartley commented Mar 25, 2023

Yes, I already tried out some fixes very much like your recommendations (they work!) and plan to figure out how your tests work next, see how I should test the changes.

I'm also accumulating a small branch of superficial fixes, such as quashing deprecation warnings I see during the test run. I'll submit it as a separate PR. #3673

@tartley
Copy link
Contributor Author

tartley commented Mar 26, 2023

Observation: If I modify the gallery RSS generation code to return a valid but empty RSS file, then no tests fail. So I conclude this isn't actually tested (Ah, and I belatedly realize the coverage output after the tests confirms this), and my intent right now is to create a minimal new test for the gallery RSS generator.

Looking at a static site generator after so many years of looking at web API code for work is breaking my brain. I keep spending 30 seconds looking for the URL routing pointing to request handlers, before realizing, oh, hang on, that all doesn't exist here. :-)

@tartley
Copy link
Contributor Author

tartley commented Mar 26, 2023

Observation: Adding a .webp image to nikola/data/samplesite/galleries/demo makes many tests fail (e.g. some with the build exception described above, but others because sitemap.xml is not generated, etc). Adding the line mimetypes.add_type('image/webp', '.webp') to the gallery RSS generation fixes everything. Perhaps this is a sufficient test failure to justify making this one-line code change, thus saving me the work of having to write a new test for gallery rss generation. (or, at least, allowing us to decouple the tasks, so I could submit the extra test later, in a separate MP.)

Imma submit it, tell me what you think...

tartley added a commit to tartley/nikola that referenced this issue Mar 26, 2023
WebP images in a gallery cause `nikola build` to fail with an unhandled
AttributeError exception, as described in getnikola#3671.

This happens because the MIME type of WebP images is not known, and the
resulting None value causes the RSS generation code to choke.

This PR fixes that, by augmenting Python's list of known mimetypes with
a hardcoded WebP one, before we ask for the mimetype of each image in a
gallery.

This addresses the above issue's checklist item "*Inject a hardcoded
'image/web' mimetype at runtime, before .guess_type() gets called.*",
but does not complete the issue.
tartley added a commit to tartley/tartley.github.io that referenced this issue Mar 26, 2023
Required me to switch themes, galleries don't work in the previous one.

Also, webp images don't work in galleries generally, but I'm using
a patched version of Nikola to generate the site.
Discussed in [#3671](getnikola/nikola#3671).
tartley added a commit to tartley/nikola that referenced this issue Mar 26, 2023
WebP images in a gallery cause `nikola build` to fail with an unhandled
AttributeError exception, as described in getnikola#3671.

This happens because the MIME type of WebP images is not known, and the
resulting None value causes the RSS generation code to choke.

This PR fixes that, by augmenting Python's list of known mimetypes with
a hardcoded WebP one, before we ask for the mimetype of each image in a
gallery.

This addresses the above issue's checklist item "*Inject a hardcoded
'image/web' mimetype at runtime, before .guess_type() gets called.*",
but does not complete the issue.
tartley added a commit to tartley/nikola that referenced this issue Mar 27, 2023
WebP images in a gallery cause `nikola build` to fail with an unhandled
AttributeError exception, as described in getnikola#3671.

This happens because the MIME type of WebP images is not known, and the
resulting None value causes the RSS generation code to choke.

This PR fixes that, by augmenting Python's list of known mimetypes with
a hardcoded WebP one, before we ask for the mimetype of each image in a
gallery.

This addresses the above issue's checklist item "*Inject a hardcoded
'image/web' mimetype at runtime, before .guess_type() gets called.*",
but does not complete the issue.
@tartley
Copy link
Contributor Author

tartley commented Jul 8, 2023

I don't want to sully your project with unclosed issues, so although there are a few checkbox items still left undone on this, I'm going to close it, because I don't think I'm going to get around to doing the last few items in the forseeable.

Thanks for your patience and guidance getting the things done I wanted to see fixed! Thanks for Nikola!

@tartley tartley closed this as completed Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants