-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic XML adapter #389
Comments
@blackerby do you have an example of the XML response you'd want to query? The generic JSON adapter allows the user to define a JSONPath expression; it shouldn't be hard to duplicate it and have a XML adapter using XPath. |
I suspected duplicating the generic JSON adapter for XML would be the way to go. Right now I'm working on a different custom adapter (your tutorial is awesome by the way, thanks so much for the good docs), but when that's in good shape I can try to get started on this or pitch in if someone else wants to start work on this. I've pasted the response to a GET request to the following URL below:
Note: this specific API offers a JSON response option, but I've got other use cases that only offer XML, so this is just offered for the sake of example. <?xml version="1.0" encoding="utf-8"?>
<api-root>
<bills>
<bill>
<congress>
118
</congress>
<type>
SRES
</type>
<originChamber>
Senate
</originChamber>
<originChamberCode>
S
</originChamberCode>
<number>
416
</number>
<url>
https://api.congress.gov/v3/bill/118/sres/416?format=xml
</url>
<title>
A resolution to authorize testimony and representation in United States v. Sullivan.
</title>
<updateDateIncludingText>
2023-10-19T12:43:41Z
</updateDateIncludingText>
<latestAction>
<actionDate>
2023-10-18
</actionDate>
<text>
Submitted in the Senate, considered, and agreed to without amendment and with a preamble by Unanimous Consent. (consideration: CR S5082-5083; text: CR S5091)
</text>
</latestAction>
<updateDate>
2023-10-19
</updateDate>
</bill>
<bill>
<congress>
118
</congress>
<type>
SRES
</type>
<originChamber>
Senate
</originChamber>
<originChamberCode>
S
</originChamberCode>
<number>
415
</number>
<url>
https://api.congress.gov/v3/bill/118/sres/415?format=xml
</url>
<title>
A resolution to authorize testimony and representation in United States v. Samsel.
</title>
<updateDateIncludingText>
2023-10-19T12:43:40Z
</updateDateIncludingText>
<latestAction>
<actionDate>
2023-10-18
</actionDate>
<text>
Submitted in the Senate, considered, and agreed to without amendment and with a preamble by Unanimous Consent. (consideration: CR S5082-5083; text: CR S5091)
</text>
</latestAction>
<updateDate>
2023-10-19
</updateDate>
</bill>
</bills>
<pagination>
<count>
10503
</count>
<next>
https://api.congress.gov/v3/bill/118?offset=2&limit=2&format=xml
</next>
</pagination>
<request>
<congress>
118
</congress>
<contentType>
application/xml
</contentType>
<format>
xml
</format>
</request>
</api-root> |
Thanks! I started working on the XML adapter today, I'll test it against https://api.congress.gov/. |
@blackerby how do you see the format of the response? For the endpoint above, eg, should each row be a string with the XML: sql> SELECT * FROM "https://api.congress.gov/v3/bill/118#/api-root/bills/bill" LIMIT 1;
bill
----
<bill>
<congress>118</congress>
<type>SRES</type>
...
</bill>
(1 row in 0.00s) Or should it return a JSON representation of the data? (so it can be processed with the JSON functions in SQLite) Something like: sql> SELECT * FROM "https://api.congress.gov/v3/bill/118#/api-root/bills/bill" LIMIT 1;
bill
----
{"congress": 118, "type": "SRES", ...}
(1 row in 0.00s) Even better, we could explode the payload to columns and have: sql> SELECT * FROM "https://api.congress.gov/v3/bill/118#/api-root/bills/bill" LIMIT 1;
congress type ... latestAction
---------- ------ --------------------------------------------------------
118 SRES {"actionDate": "2023-10-18", "text": "Submitted in ..."}
(1 row in 0.00s) |
Are XML attributes important? Or do we care more about the text? |
To your first question, I think the third option (exploding the payload to columns) is the way to go. Then columns with JSON in them (like the To your second question about XML attributes, I will have use cases in which attributes are important, but they may be specific enough that they require a custom adapter, e.g., for MODS. Is it easy enough to incorporate attribute processing syntax into the XPath URL fragment? |
I'm using <foo bar="baz">hi</foo> What should we map that to?
|
I hear you on the verbosity concern, but the second option ( |
I think |
@cwegener thanks for the tip on |
@blackerby, I released 1.2.8 with a simple generic XML adapter that only cares about text. If we need we can later implement a different algorithm that exposes XML attributes, and have a way of specifying which one should be used. |
That is great, thanks @betodealmeida. I'll play with the new release this week and open a new issue if/when access to XML attributes becomes a challenge. I'm also looking forward to digging into the commit that added the XML adapter -- seems like a great way to learn. |
Is your feature request related to a problem? Please describe.
Currently, I am not able to query XML files using Shillelagh. I have conducted Google searches, searches of project documentation, and a search of the Apache Superset Slack but have not found an off-the-shelf solution.
Describe the solution you'd like
I would like to provide source data for an Apache Superset chart from an XML file not on a local file system but available from a URL.
Describe alternatives you've considered
The only other thing I can think of would be to read the XML file into a Pandas DataFrame and query that (cf. #388 and related Slack conversation), which I admit I have not yet tried.
Additional context
The issue and Slack conversation referenced above established a need for a custom adapter. I expect my issue could be solved by a customer adapter, too. I am happy to put in the learning and work to develop such an adapter, but I want to make sure I'm not missing something obvious before I head down that path.
The text was updated successfully, but these errors were encountered: