Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster XML parser #103

Closed
henry2004y opened this issue Sep 26, 2022 · 4 comments
Closed

Faster XML parser #103

henry2004y opened this issue Sep 26, 2022 · 4 comments

Comments

@henry2004y
Copy link
Owner

The performance of the XML parser is now a bottleneck of IO. The wrappers over libxml2 (LightXML, EzXML) do not provide optimal performance. We may look for some native Julia implementations, such as PLists.jl, but this one is built for educational purposes and lacks in many features. If we really want it we may look at demos from Matlab to learn more about XML parsers.

@henry2004y
Copy link
Owner Author

henry2004y commented Oct 11, 2022

I made a pure Julia XML parser based on PLists.jl, https://github.com/henry2004y/Vlasiator.jl/tree/native_xml_parser. It turns out that it is much slower than EzXML (wrapper over libxml2):

Native parser:

julia> @benchmark meta = load(file)
BenchmarkTools.Trial: 7246 samples with 1 evaluation.
 Range (min  max):  648.512 μs    4.994 ms  ┊ GC (min  max): 0.00%  77.29%
 Time  (median):     661.275 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   687.207 μs ± 269.506 μs  ┊ GC (mean ± σ):  2.35% ±  5.13%

   ▁█▆▂    ▁                                                     
  ▂████▆▄▄▇█▇▅▄▃▃▃▄▃▃▂▂▂▂▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂ ▃
  649 μs           Histogram: frequency by time          792 μs <

 Memory estimate: 200.08 KiB, allocs estimate: 4555.

EzXML:

julia> @benchmark meta = load(file)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  153.291 μs  166.695 ms  ┊ GC (min  max): 0.00%  4.32%
 Time  (median):     165.386 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   295.421 μs ±   4.373 ms  ┊ GC (mean ± σ):  1.68% ± 0.11%

  ▄▇█▇▆▄▃▄▅▄▄▃▂▁                                                ▂
  ████████████████▇█▇▇▆▇▆▆▆▅▇▅▅▆▄▆▇▆▄▇▇▅▅▆▇▅▅▄▇▇▆▄▂▂▇███▇▄▅▂▂▅▇ █
  153 μs        Histogram: log(frequency) by time        378 μs <

 Memory estimate: 28.49 KiB, allocs estimate: 344.

As for the VTK writing part, we still use EzXML for generating *.vthb and LightXML (used by WriteVTK) for generating *.vtk.

@henry2004y
Copy link
Owner Author

With some optimization and simplification:

Native parser 2:

julia> @benchmark meta = load(file)
BenchmarkTools.Trial: 8003 samples with 1 evaluation.
 Range (min  max):  580.358 μs    5.034 ms  ┊ GC (min  max): 0.00%  78.78%
 Time  (median):     591.478 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   622.061 μs ± 258.956 μs  ┊ GC (mean ± σ):  2.16% ±  4.63%

  ▇█▅▆▆▃▃▄▂▃▄▂▁         ▁                                       ▂
  █████████████████▇▇▇▄███▇▇█▄▇▇▅▄▄▃▆▅▅▅▅▆▅▃▄▄▄▃▅▄▄▄▄▄▄▃▄▄▄▃▂▄▄ █
  580 μs        Histogram: log(frequency) by time        876 μs <

 Memory estimate: 157.36 KiB, allocs estimate: 3498.

Still far from the performance of libxml2. Most of the time and allocations are spent on push!.

@henry2004y
Copy link
Owner Author

Now there is a new native Julia package XML.jl. Check it out!

@henry2004y henry2004y reopened this Jun 5, 2023
@henry2004y henry2004y mentioned this issue Jun 6, 2023
@henry2004y
Copy link
Owner Author

We now switch to the native Julia XML parser. This adds some more allocations in loading metadata, but is generally faster in all test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant