Skip to content
This repository has been archived by the owner on Jan 4, 2024. It is now read-only.

Fix "error: Document is empty." when empty files are present in /docs #21

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Muxiner
Copy link

@Muxiner Muxiner commented Sep 18, 2023

Symptom

When combined with mkdocs-video, an ERROR will occur if there are empty md files in the /docs path.

ERROR   -  Error reading page 'EMPTY.md': Document is empty
Traceback (most recent call last):
  ...
  File "...\Python310\site-packages\mkdocs_video\plugin.py", line 28, in on_page_content
    content = lxml.html.fromstring(html)
  File "...\Python\Python310\site-packages\lxml\html\__init__.py", line 873, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "...\Python\Python310\site-packages\lxml\html\__init__.py", line 761, in document_fromstring
    raise etree.ParserError(
lxml.etree.ParserError: Document is empty

def on_page_content(self, html, page, config, files):
content = lxml.html.fromstring(html)
tags = content.xpath(f'//img[@alt="{self.config["mark"]}" and @src]')
for tag in tags:
if not tag.attrib.get("src"):
continue
tag.getparent().replace(tag, self.create_repl_tag(tag))
return lxml.html.tostring(content, encoding="unicode")

Analysis

According to the error message, I took a look at the code on line 761 of file lxml\html\__init__.py:

def document_fromstring(html, parser=None, ensure_head_body=False, **kw):
    if parser is None:
        parser = html_parser
    value = etree.fromstring(html, parser, **kw)
    if value is None: # << Here causes the problem
        raise etree.ParserError(
            "Document is empty")
    if ensure_head_body and value.find('head') is None:
        value.insert(0, Element('head'))
    if ensure_head_body and value.find('body') is None:
        value.append(Element('body'))
    return value

at https://github.com/lxml/lxml/blob/762f62c5a1ab62ce37397aeeab2c27fdcc14ca66/src/lxml/html/__init__.py#L756-L767

My understanding is that as mkdocs-video uses lxml, and when lxml converts the Markdown files, an ERROR is thrown because one of the Markdown files is empty and hence value is None.

As long as there are empty files present in /docs, an ERROR will be reported, even if the nav setting in mkdocs.yml did not explicitly include the file as a page.

Fix

In an attempt to fix this (rather easily), an extra check is added to the on_page_content method. The method will execute only when the passed html object is not empty, effectively skipping the empty file (as we don't need to process it anyways).


In hindsight, I think mkdocs does allow empty files to exist in /docs (site builds will proceed without problems with empty files). With mkdocs-video, site builds will fail with the symptom described above. With this patch, site builds will succeed without problems, just like before. The fix may not be perfect and may need some further modifications to meet project standards.

continue
tag.getparent().replace(tag, self.create_repl_tag(tag))
return lxml.html.tostring(content, encoding="unicode")
if html:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added condition check for empty html objects

@mteichtahl
Copy link

mteichtahl commented Sep 27, 2023

any chance we can get this merged ?

@pabloFuente
Copy link

I am facing the same problem

@gorger3
Copy link

gorger3 commented Oct 12, 2023

The same problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants