Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: description of supported formats and backends #788

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ceberam
Copy link
Contributor

@ceberam ceberam commented Jan 22, 2025

  • Add documentation on supported formats and conversion backends
  • Add a notebook example on custom XML formats (USPTO and PMC)
  • Remove unnecessary imports on tests
  • Remove unnecessary type-ignore marks

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link

mergify bot commented Jan 22, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

dolfim-ibm
dolfim-ibm previously approved these changes Jan 22, 2025
mkdocs.yml Outdated Show resolved Hide resolved
"cell_type": "markdown",
"metadata": {},
"source": [
"# Backend converters for XML"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent any confusion (e.g. that any type of XML is supported) I would suggest something more explicit:

Suggested change
"# Backend converters for XML"
"# Conversion of USPTO XML & PubMed XML"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I just kept Conversion of custom XML since otherwise the left navigation menu would lose its simple line items

Comment on lines 898 to 1055
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete temporary files\n",
"\n",
"The XML files used in this notebook, as well as the Milvus local database will be removed."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.rmtree(TEMP_DIR)"
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since TEMP_DIR is anyway defined as temporary directory (i.e. will be removed upon restart), this part can also be dropped for simplicity (and to avoid any accidents with folder deletions..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is not the case for temporary directories. Restarting the kernel does not remove a temporary directory created with mkdtemp():

The user of mkdtemp() is responsible for deleting the temporary directory and its contents when done with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a system (OS) restart

After commit b74208 of docling-core, text items can be attached to any NodeItem
and therefore the ignore[arg-type] type marks can be removed.

Signed-off-by: Cesar Berrospi Ramis <[email protected]>
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants