Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discusion about API structure (and comparisons with pygmt) #689

Closed
adigitoleo opened this issue Sep 10, 2021 · 17 comments
Closed

Discusion about API structure (and comparisons with pygmt) #689

adigitoleo opened this issue Sep 10, 2021 · 17 comments

Comments

@adigitoleo
Copy link

adigitoleo commented Sep 10, 2021

This is a sort of meta-issue where I will try to summarize my thoughts about this package. I will add links to relevant trackers at the bottom as they appear.

As the first priority, I think this package can benefit from following pygmt more closely. Of course, Julia is not Python and subsequently there will be implementation details that need to change. However, using a similar structure to pygmt could be beneficial for a few reasons, at least it would be

  • easier for pygmt users to switch, if they want to use Julia, and
  • easier for pygmt contributors to help, if they want to learn Julia.

I think the second point is important since it's a fairly niche package and finding contributors/developers with enough knowledge of gmt is not easy. In some areas, this package already offers more aliases and expressive syntax compared to pygmt (at least from my quick skim of the pygmt docs). I'm not suggesting to remove it, but I think adding new aliases is not a priority for now (even though I opened a few requests before 😄 ). Instead, we could focus on:

  • completing the "monolithic" mode low-level interface (earlier errors? more ways to pass in data?),
  • adding types like Figure etc. and making functions return (containers of) gmt commands instead of running them, to allow for easier incremental plotting
  • making call signatures match pygmt where possible (i.e. where it doesn't unnecessarily complicate Julia's dispatch system)
  • reorganising the source code to follow pygmt structure more closely in general

I'll start to play around with the implementations when I find the time, and I also need to dig into pygmt more because I've never used it before. Let me know if it sounds like a reasonable idea.

Cheers

@joa-quim
Copy link
Member

First let me thank you for this deep thinking about the GMT.jl structure but I confess that I'm failing to to follow your points. First two are easy to reply, last two, less.

stabilising/completing the "monolithic" mode, i.e. the thin fallback wrapper for normal gmt (I don't remember the exact issues I had, but I think it was mostly about I/O),

What is it that needs to stabilize/complete in the monolithic mode? All modules end up calling the monolithic mode under the hood. And pygmt does not use monolithic mode at all.

adding types like Figure etc. and making functions return gmt commands instead of running them, to allow for easier incremental plotting

Just use Vd=2. Again, pygmt does not have this facility.

making call signatures match pygmt where possible (i.e. where it doesn't unnecessarily complicate Julia's dispatch system)

Sorry, don't follow you here. Maybe some example?

reorganising the source code to follow pygmt structure more closely in general

Funny, there is an issue somewhere in pygmt where they wanted to follow the GMT.jl structure. But aside from it, I also don't see what you mean here. Note that GMT.jl is way more feature reach that pygmt with much code that extends GMT capabilities (think on imshow, gmtread, gdal funs, proj4 funs, ogr2GMTdataset, histogram, contourf, ternary plots and others) so what that structure would be?

@adigitoleo
Copy link
Author

adigitoleo commented Sep 11, 2021

You're right, it's not very clear without examples, I probably need to organize the ideas a bit more but maybe this will help:

What is it that needs to stabilize/complete in the monolithic mode

Stabilize was not the right word (and on second thought it's not really about monolithic mode either), but we could think about things like thread-safety (#564), how to detect some more errors before calling gmt and some easier ways to pas in-memory data from julia to the GMT command without writing files. I confess I haven't used the GMTgrid types enough, maybe that is what I was looking for, but I ended up just running gmt as a Julia Cmd for some things since it was simple to just hcat some data:

using GeoJSON
using DelimitedFiles
# Not a real file, just for illustration
boundaries = open("/file/with/GeoJSON_plate_boundaries.json") do file
    geo2dict(GeoJSON.read(file))
end
for data in boundaries["features"]
    open(`gmt plot -Wthinner,white`, write = true) do io
        writedlm(io, collect(transpose(reduce(hcat, data["geometry"]["coordinates"]))))
    end
end

You're right, such a gmt function is not in pygmt, I didn't check the pygmt implementation enough :)

Just use Vd=2

That is very useful, but here I was talking about assigning some sort of Figure instance where the plotting commands can be built up incrementally, like in e.g. matplotlib, Plots.jl etc. In pygmt there is a Figure class for example:

import pygmt
fig = pygmt.Figure()
fig.basemap(region=[-90, -70, 0, 20], projection="M15c", frame=True)
fig.coast(shorelines=True)
fig.show()

This could tie into the thread-safety thing, since we could assign one instance of the gmt API to each figure, for example.

About call signatures, I was mainly just thinking to try and keep the keyword argument names/types similar to pygmt, I'll need to check for specific examples.

The last point was nonsense, I initially found the pygmt code easier to follow but after looking again I think it was just because I'm more familiar with Python. As you mentioned, there is a lot of extra functionality here: maybe it can be shifted to a submodule? But again, it's just subjective.

@adigitoleo adigitoleo changed the title Follow pygmt structure more closely Discusion about API structure (and comparisons with pygmt) Sep 11, 2021
@adigitoleo
Copy link
Author

Maybe I can try to list some more concrete points:

  • I didn't pay enough attention to the available grid manipulation tools, e.g. grdcut, which are awesome and probably what I need. The only suggestion here is for another method or two of gmtread that can read from IO or NCDataset or similar, so if I have some data loaded with NCDatasets.jl for example, I can read that into GMT.jl directly.
  • In pygmt they have some custom error types which could make control flow easier, and current error messages often refer to the gmt commandline syntax which could be confusing (maybe just adding a notice above the gmt message telling to compare it against the output from Vd=2 could be enough)
  • Figure type for storing references to figure instances (and maybe individual API sessions)
  • Consistent keywords with pygmt, e.g. for pscoast we have area (gmt -A) where they have area_thresh, our gmt -C is river_fill and theirs is lakes... (maybe these "long-form" names can become part of an upstream GMT spec?)

@joa-quim
Copy link
Member

using GeoJSON
using DelimitedFiles
# Not a real file, just for illustration
boundaries = open("/file/with/GeoJSON_plate_boundaries.json") do file
    geo2dict(GeoJSON.read(file))
end
for data in boundaries["features"]
    open(`gmt plot -Wthinner,white`, write = true) do io
        writedlm(io, collect(transpose(reduce(hcat, data["geometry"]["coordinates"]))))
    end
end

Have you tried?

boundaries = gmtread("/file/with/GeoJSON_plate_boundaries.json");
imshow(boundaries)
or even
imshow(boundaries, proj=:guess)

@joa-quim
Copy link
Member

joa-quim commented Sep 11, 2021

import pygmt
fig = pygmt.Figure()
fig.basemap(region=[-90, -70, 0, 20], projection="M15c", frame=True)
fig.coast(shorelines=True)
fig.show()

Why would that be nicer then?

basemap(region=[-90, -70, 0, 20], projection="Mercator")
coast!(shorelines=true, show=true)

GMT.jl follows the Julia standards of Plots.jl and others where we use the ! form when one want to append to a figure.
But it is also possible to use the GMT modern mode style where one starts with a gmtbegin() and end with a gmtend(). Under the hood this is what pygmt does for its Figure style.

@joa-quim
Copy link
Member

joa-quim commented Sep 11, 2021

About call signatures, I was mainly just thinking to try and keep the keyword argument names/types similar to pygmt, I'll need to check for specific examples.

GMT.jl exists for quite sometime before PyGMT (for some modules it predates it for several years). For most of cases the the keyword names are equal but for others they decided to call it differently. Sometimes out of need, e.g series instead of range for -T option (in makecpt for example) because range is a reserved word in python. Other cases for choice like perspective instead of view (-p). Though perspective is also accepted in GMT.jl I clearly prefer view and that's what the examples in docs use.

@joa-quim
Copy link
Member

The only suggestion here is for another method or two of gmtread that can read from IO or NCDataset or similar, so if I have some data loaded with NCDatasets.jl for example, I can read that into GMT.jl directly.

It's quite likely that that you can use GMT.jl directly because either GMT and GDAL (most of it is accessible via GMT.jl) can access data subdatasets. Please provide an example of your need. Maybe in a different issue (this touches many points) or post the question in the forum.

@joa-quim
Copy link
Member

joa-quim commented Sep 11, 2021

how to detect some more errors before calling gmt

As a rule I try to catch the errors before sending the command to GMT, but ofc the situation is far from perfect. I guess that we have to advance in this based on a case by case basis. If you have examples of errors that should/could be catch in the Julia side, please open issues with them.

and some easier ways to pas in-memory data from julia to the GMT command without writing files.

Very, very rarely GMT.jl resorts to the trick of writing temporary files and I hate when I have to do that. The memory management of grids and images is quite complicated. Julia is column-major whilst GMT, being a C lib, has arrays in row-major. This means that one must do transposes for communicating between the two. Add to it the terrible pad that basically obliges to make a copy of the array. However, GMT.jl tries a lot to avoid it by communicating with GMT via external memory objects and when grids were read via GDAL it uses the C memory layout to avoid that copy when possible. Dealing with this is awfully complicated and the situation is still far from what it ideally can become (for example always using GDAL to do projections and pass the projected array to GMT, thus avoiding any copy at all). From what I (think I) know, pygmt always duplicates the input grids or images.

@adigitoleo
Copy link
Author

Thanks for the discussion, and just to be clear I don't mean to overwhelm you with idle requests: I'm excited about this package and I was wanting to get a sense for the kind of things that I could help with (or if such help is needed).

boundaries = gmtread("/file/with/GeoJSON_plate_boundaries.json")

This fails with a MethodError from GMTdataset, because GMT does not support the GeoJSON file format. That's why I was using the GeoJSON module to load the data in this example, which works fine, but then to pass that data to GMT.jl was the part where I got stuck. I'll look into GDAL more, it seems they have a lot of readers. In other cases, I have data tables coming out of direct processing in Julia, so that's where the idea for reading from in-memory structures came from. But I will first try mat2ds in in that case.

When I said "writting files" it's because that was what I have been doing instead of learnig to use mat2ds 😄, but you're right, I appreciate that GMT.jl tries to avoid it if possible.


basemap(region=[-90, -70, 0, 20], projection="Mercator")
coast!(shorelines=true, show=true)`

It's not quite the same: in pygmt the figure instance can be passed around inside the code and dataset plots can be added to it at a later stage. In a similar way, the Plots.jl functions return some kind of handle that can be captured (by default, plot! etc. append to the current() figure, but it's possible to manipulate different figures if it suits the workflow). It's sometimes also nice when working in the julia shell. Consider this very rough mockup:

b = basemap(region=[-90, -70, 0, 20], projection="Mercator")
c = coast!(b, shorelines=true)
# I wonder if it looks correct?
showfig(c)
# Actually, I want two maps
plt =  subplot(b, c, grid=(1, 2))
# OK now we can save it
showfig(plt)  # Figure is saved in current directory, could be improved by having a savefig() or similar

Yes, this would require some more thought about how to integrate with gmtbegin() and gmtend().


GMT.jl exists for quite sometime before PyGMT

Ah, I didn't know. It's unfortunate that they have used some different conventions, but it's not a big issue. I'll leave this out of discussion unless someone suggests changes for specific cases.

@joa-quim
Copy link
Member

joa-quim commented Sep 12, 2021

No bother, you are having only one request: the Figure story (I'll get to it) and I truly appreciate the discussion and your willing to help (there are lots of places where it can be used).

It is not well known but GMT can read the so called ogr formats (e.g. GeoJSON) but pure GMT does not know what to do with that data, so in practical terms it's as if it could not read them. However, thank to Julia's ccall mechanism, GMT.jl can decode the contents of the data pointer in the Julia side. So it should be able to read your file. Perhaps you can open a separate issue with this and if possible with a link to the data.


Now the Figure issue. You are right about on how a figure is handled in Plots. I don't really understand how that works but it and all other plotting systems somehow hold the parts of the figures in memory and only assemble it at the end. This allows for example to automatically extend the plot limits if a larger dataset is added later. But GMT works very differently. Each plotting command is immediately executed and added to what we call the postscript layer cake (this makes it that you cannot later change the map limits set in the first command). When the modern mode is used (i.e. after a gmtbegin()) the postscript and other auxiliary files are written in ~/.gmt/sessions/sessiondir_somecode and that some code is determined by the process PID. I'm not 100% sure but I highly doubt that pygmt can handle more than one Figure instance at same time. That would require a tight control over the active PID and knowing how to switch them inside the GMT C lib. And if only one Figure can be trailed around, then it makes no difference if you call it Figure or append new layers with the ! convention, which also can be done at a later stage. As long as a show or figname argument is not used the figure remains open to receive new layers.

That said, it does not seem impossible to implement your mockup case and I also would like to use subplots like in your example. But subplots are modern mode only and we would fall in the PID issue. The alternative would be to get a way to keep track to which postscript file we are appending and then at the end create a final ps file (more than duplicating the ps contents) using the module image (that before was known as psimage). Looks doable, but not trivial.

@adigitoleo
Copy link
Author

adigitoleo commented Sep 13, 2021

About reading data:

I found that I can kind of read GeoJSON with data = gmtread("/some/geoJSON/file.json", data=true), but the JSON data is just put into a vector of strings in data[1].text, so I can't easily use that GMTdataset for plotting. Thanks for linking the padding issue before, it's quite helpful to understand why the data communication is so complicated (more than I expected). I will see if I can work out ways of constructing GMTdatasets from Julia dicts/arrays and plotting them.

(update: I've got some things working using mat2ds now, so that's nice. I'll try to collect some good examples for adding to the docs)

About Figure:

The alternative would be to get a way to keep track to which postscript file we are appending

Interesting, I'll need to think more about it. I had another crazy idea: what about not running the GMT session at all until showfig is called? So we only need to track some sort of string(s) of the GMT command (similar to Vd=2) and then modify this underlying GMT "code" instead? But I can understand that maybe that would be too complicated, since the logic for constructing the GMT commands is already pretty advanced even without that kind of modification...

@joa-quim
Copy link
Member

Are you using an old GMT.jl version? Because *.json files are detected automatically
https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gmtreadwrite.jl#L174
Otherwise you should use ogr=true and not data=true that is for reading plain data tables.

About Figure

The problem with executing all at once in the end is that we would be forced to keep alive all instances of the data that each of the partial commands expect. But if main goal is subplot, there is a subplot module that lets you work on the different panels in the order you want and even swapping back and forth between them. And in modern mode one can work in more than one figure and also leave one unfinished, do something else, and coming back later to finish it. I just not sure I ported this option and if I did, if it works well.

@adigitoleo
Copy link
Author

Hmm, actually my ?gmtread does say that as well, but when I try I get the MethodError (also when I try with ogr=true)... I'll investigate some more and maybe open a specific issue for that.

For now I managed to do it with mat2ds and it's a lot simpler than I thought: https://forum.generic-mapping-tools.org/t/using-julia-to-plot-a-geojson-coordinate-matrix-gmt-jl-vs-gmt-command/2080

I'll test out the modern mode some more then as well.

@joa-quim
Copy link
Member

Please post the full error trace. Still wanted to understand why the format is not detected automatically for you.

@adigitoleo
Copy link
Author

#693

@adigitoleo
Copy link
Author

I think I'll close this since there was still a lot of confusion from my side, and most of the things I thought about were either solved by the PR that fixed gmtread, or by better understanding of GMT/GDAL, or they require dedicated issues with better examples. I think you're right about the Figure idea: there's not a nice and easy way to do it for now. I'm currently trying out the modern mode for some new maps, I'll see how it goes :)

@joa-quim
Copy link
Member

If you are playing for fun with the modern mode (and no mode), consider keeping some of your experiments to expand the gallery section. It really needs a better organization and much more examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants