SweetXml
is a thin wrapper around :xmerl
. It allows you to convert a
char_list
or xmlElement
record as defined in :xmerl
to an elixir value such
as map
, list
, string
, integer
, float
or any combination of these.
Add dependency to your project's mix.exs
:
def deps do
[{:sweet_xml, "~> 0.6.6"}]
end
SweetXml
depends on :xmerl
. On some Linux systems, you might need
to install the package erlang-xmerl
.
Given an XML document such as below:
<?xml version="1.05" encoding="UTF-8"?>
<game>
<matchups>
<matchup winner-id="1">
<name>Match One</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
</team>
<team>
<id>2</id>
<name>Team Two</name>
</team>
</teams>
</matchup>
<matchup winner-id="2">
<name>Match Two</name>
<teams>
<team>
<id>2</id>
<name>Team Two</name>
</team>
<team>
<id>3</id>
<name>Team Three</name>
</team>
</teams>
</matchup>
<matchup winner-id="1">
<name>Match Three</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
</team>
<team>
<id>3</id>
<name>Team Three</name>
</team>
</teams>
</matchup>
</matchups>
</game>
We can do the following:
import SweetXml
doc = "..." # as above
Get the name of the first match:
result = doc |> xpath(~x"//matchup/name/text()") # `sigil_x` for (x)path
assert result == 'Match One'
Get the XML record of the name of the first match:
result = doc |> xpath(~x"//matchup/name"e) # `e` is the modifier for (e)ntity
assert result == {:xmlElement, :name, :name, [], {:xmlNamespace, [], []},
[matchup: 2, matchups: 2, game: 1], 2, [],
[{:xmlText, [name: 2, matchup: 2, matchups: 2, game: 1], 1, [],
'Match One', :text}], [],
...}
Get the full list of matchup name:
result = doc |> xpath(~x"//matchup/name/text()"l) # `l` stands for (l)ist
assert result == ['Match One', 'Match Two', 'Match Three']
Get a list of winner-id by attributes:
result = doc |> xpath(~x"//matchup/@winner-id"l)
assert result == ['1', '2', '1']
Get a list of matchups with different map structure:
result = doc |> xpath(
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
)
assert result == [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
]
Or directly return a mapping of your liking:
result = doc |> xmap(
matchups: [
~x"//matchups/matchup"l,
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
],
last_matchup: [
~x"//matchups/matchup[last()]",
name: ~x"./name/text()",
winner: [
~x".//team/id[.=ancestor::matchup/@winner-id]/..",
name: ~x"./name/text()"
]
]
)
assert result == %{
matchups: [
%{name: 'Match One', winner: %{name: 'Team One'}},
%{name: 'Match Two', winner: %{name: 'Team Two'}},
%{name: 'Match Three', winner: %{name: 'Team One'}}
],
last_matchup: %{name: 'Match Three', winner: %{name: 'Team One'}}
}
In the above examples, we used the expression ~x"//some/path"
to
define the path. The reason is it allows us to more precisely specify what
is being returned.
-
~x"//some/path"
without any modifiers,
xpath/2
will return the value of the entity if the entity is of typexmlText
,xmlAttribute
,xmlPI
,xmlComment
as defined in:xmerl
-
~x"//some/path"e
e
stands for (e)ntity. This forcesxpath/2
to return the entity with which you can further chain yourxpath/2
call -
~x"//some/path"l
'l' stands for (l)ist. This forces
xpath/2
to return a list. Withoutl
,xpath/2
will only return the first element of the match -
~x"//some/path"k
'k' stands for (k)eyword. This forces
xpath/2
to return a Keyword instead of a Map. -
~x"//some/path"el
- mix of the above -
~x"//some/path"s
's' stands for (s)tring. This forces
xpath/2
to return the value as string instead of a char list. -
~x"//some/path"S
'S' stands for soft (S)tring. This forces
xpath/2
to return the value as string instead of a char list, but if node content is incompatible with a string, set""
. -
~x"//some/path"o
'o' stands for (o)ptional. This allows the path to not exist, and will return nil.
-
~x"//some/path"sl
- string list. -
~x"//some/path"i
'i' stands for (i)nteger. This forces
xpath/2
to return the value as integer instead of a char list. -
~x//some/path"I
'I' stands for soft (I)nteger. This forces
xpath/2
to return the value as integer instead of a char list, but if node content is incompatible with an integer, set0
. -
~x"//some/path"f
'f' stands for (f)loat. This forces
xpath/2
to return the value as float instead of a char list. -
~x//some/path"F
'F' stands for soft (F)loat. This forces
xpath/2
to return the value as float instead of a char list, but if node content is incompatible with a float, set0.0
. -
~x"//some/path"il
- integer list.
If you use the optional modifier o
together with a soft cast modifier
(uppercase), then the value is set to nil
when the value is not compatible
for instance ~x//some/path/text()"Fo
return nil
if the text is not a number.
Also in the examples section, we always import SweetXml first. This
makes x_sigil
available in the current scope. Without it, instead of using
~x
, you can use the %SweetXpath
struct
assert ~x"//some/path"e == %SweetXpath{path: '//some/path', is_value: false, is_list: false, cast_to: false}
Note the use of char_list in the path definition.
Given a XML document such as below
<?xml version="1.05" encoding="UTF-8"?>
<game xmlns="http://example.com/fantasy-league" xmlns:ns1="http://example.com/baseball-stats">
<matchups>
<matchup winner-id="1">
<name>Match One</name>
<teams>
<team>
<id>1</id>
<name>Team One</name>
<ns1:runs>5</ns1:runs>
</team>
<team>
<id>2</id>
<name>Team Two</name>
<ns1:runs>2</ns1:runs>
</team>
</teams>
</matchup>
</matchups>
</game>
We can do the following:
import SweetXml
xml_str = "..." # as above
doc = parse(xml_str, namespace_conformant: true)
Note the fact that we explicitly parse the XML with the namespace_conformant: true
option. This is needed to allow nodes to be identified in a prefix
independent way.
We can use namespace prefixes of our preference, regardless of what prefix is used in the document:
result = doc
|> xpath(~x"//ff:matchup/ff:name/text()"
|> add_namespace("ff", "http://example.com/fantasy-league"))
assert result == 'Match One'
We can specify multiple namespace prefixes:
result = doc
|> xpath(~x"//ff:matchup//bb:runs/text()"
|> add_namespace("ff", "http://example.com/fantasy-league")
|> add_namespace("bb", "http://example.com/baseball-stats"))
assert result == '5'
Here's a brief explanation to how nesting came about.
Both xpath
and xmap
can take an :xmerl
XML record as the first argument.
Therefore you can chain calls to these functions like below:
doc
|> xpath(~x"//li"l)
|> Enum.map fn (li_node) ->
%{
name: li_node |> xpath(~x"./name/text()"),
age: li_node |> xpath(~x"./age/text()")
}
end
Since the previous example is such a common use case, SweetXml allows you just simply do the following
doc
|> xpath(
~x"//li"l,
name: ~x"./name/text()",
age: ~x"./age/text()"
)
But what you want is sometimes more complex than just that, SweetXml thus also allows nesting
doc
|> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()",
last: ~x"./last/text()"
],
age: ~x"./age/text()"
)
Sometimes we need to transform the value to what we need, SweetXml supports that
via transform_by/2
doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"
result = doc |> xpath(
~x"//li"l,
name: [
~x"./name",
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
],
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
The same can be used to break parsing code into reusable functions that can be used in nesting:
doc = "<li><name><first>john</first><last>doe</last></name><age>30</age></li>"
parse_name = fn xpath_node ->
xpath_node |> xmap(
first: ~x"./first/text()"s |> transform_by(&String.capitalize/1),
last: ~x"./last/text()"s |> transform_by(&String.capitalize/1)
)
end
result = doc |> xpath(
~x"//li"l,
name: ~x"./name" |> transform_by(parse_name),
age: ~x"./age/text()"i
)
^result = [%{age: 30, name: %{first: "John", last: "Doe"}}]
For more examples, please take a look at the tests and help.
SweetXml
now also supports streaming in various forms. Here's a sample XML doc.
Notice the certain lines have XML tags that span multiple lines:
<?xml version="1.05" encoding="UTF-8"?>
<html>
<head>
<title>XML Parsing</title>
<head><title>Nested Head</title></head>
</head>
<body>
<p>Neato €</p><ul>
<li class="first star" data-index="1">
First</li><li class="second">Second
</li><li
class="third">Third</li>
</ul>
<div>
<ul>
<li>Forth</li>
</ul>
</div>
<special_match_key>first star</special_match_key>
</body>
</html>
Working with streams is exactly the same as working with binaries:
File.stream!("file_above.xml") |> xpath(...)
Once you have a file stream, you may not want to work with the entire document to save memory:
file_stream = File.stream!("file_above.xml")
result = file_stream
|> stream_tags([:li, :special_match_key])
|> Stream.map(fn
{_, doc} ->
xpath(doc, ~x"./text()")
end)
|> Enum.to_list
assert result == ['\n First', 'Second\n ', 'Third', 'Forth', 'first star']
Warning: In case of large document, you may want to use the discard
option to avoid memory leak.
result = file_stream
|> stream_tags([:li, :special_match_key], discard: [:li, :special_match_key])
Copyright (c) 2014, Frank Liu
SweetXml source code is licensed under the MIT License.