Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLParsedAsHTMLWarning #170

Open
kantskernel opened this issue Jun 9, 2022 · 4 comments
Open

XMLParsedAsHTMLWarning #170

kantskernel opened this issue Jun 9, 2022 · 4 comments

Comments

@kantskernel
Copy link

kantskernel commented Jun 9, 2022

I see the following warning when inputting xml ofx file

XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor.
warnings.warn(

Going through previous decisions and code behavior, I'm thinking it is intentional that HTML parser is used for XML (e.g. here)

I am thinking the warning shouldn't happen rather than me going in and specifying XML in the constructor - but might be misguided. Here is one more issue I saw related to this: https://github.com/EnergieID/entsoe-py/issues/180

My issue is not the same, I am actually using ofxparse in the context of beancount-reds-importers (FWIW)

Here's an example of what my input file is starting with:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?OFX OFXHEADER="200" VERSION="220"
@redstreet
Copy link

Ditto. Curious if other users aren't hitting this?

@thehilll
Copy link

I see this too.

@jseutter
Copy link
Owner

jseutter commented Jun 16, 2023

So yes, it was intentional to parse the XML this way. I don't recall this warning message appearing in the past, so one of the dependencies (BeautifulSoup?) must have added it. I can take a look at silencing the warning, or if someone else happens to look at it first, I'd be happy to review the change.

When I wrote this library parsing as XML would be too strict and parsing would fail, because SGML is a superset of XML. The HTML parser is more forgiving and just ignores the bits it doesn't understand.

@redstreet
Copy link

Thank you, @jseutter! I haven't looked at ofxparse, but this commit does exactly what is needed. I imagine you simply need to put it in the right file in ofxparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants