-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Petition to make xscen a dependency of miranda #128
Comments
In practice, I think the first step here would be to:
Afterwards, we could think of using more xscen's stuff in miranda's data corrections, file saving, etc. Merging the common functions along the way. May be that one day, a "data ingestion pipeline" would be some kind of xscen workflow, making use of the yaml config and catalogs. |
Thanks for summarizing these justifications. I don't know much of the internals of xscen, so getting a sense of how they work together now that both projects are beginning to mature (adolescent phase?), this is useful knowledge to have on hand. The common scope for both projects does make it enticing to make one a dependency of the other. While a full installation of Miranda shares a lot in common with the base installation for xscen, there was a lot of work put in to move a lot of these dependencies into separate install recipes. Miranda is light enough to work in a pure python environment, and a lot of effort was made to move the external API-handling functionality (intake, intake-esm, fabric, cdsapi) to being completely optional. I'm very hesitant to call The environment.yml encompasses everything possible at the moment, but there are some modules that still need to be pruned/optimized, so there's a possibility that even more will be removed/made optional. Making xscen a dependency goes against this idea. The primary machinery in Miranda is much lower-level than an all-encompassing workflow system like xscen, and I would counter-propose that Miranda should be seen as a dependency for xscen, or that we should be creating a utility library to synchronize our templates instead
It likely comes as no surprise that I'm really not convinced that this is a good idea. Yes, there are commonalities for both, but I think there's a much better way (and would be willing to place effort in solving this problem more directly). |
This is just my opinion but I think Ouranos users should be priotitize over future external users of Miranda (or xscen for that matter). When deciding what to do, we should keep in mind the non-developper current xscen users. It is not ideal that they should have to go dig into miranda to make xscen work. Also, I think we might be mixing up 2 things:
We might have 2 solutions for those 2 things. |
(@Zeitsperre maybe we should stick to the issue for the discussion, so everyone is able to participate) For the "configuration" issue, one thought and one idea:
As for the "common tooling" issue, my fear is that the solution for duplicate code is to add one more package to maintain. Not sure its very time efficient for us. Also, it adds a layer for the user, documentation is spread over one more package. And a final thought, mainly about the dependencies issue. Who works with non-conda virtual environments ? Personally, since the arrival of Mamba, I don't think I'll suggest anything else to users, at least in the near and mid future. And conda makes the usage of "optional" dependencies kinda irrelevant : the standard way to doing things is installing everything. Which is to say that it is normal under this convention to have packages that you only use in edge cases, and to have packages with a much larger or barely-overlapping scope. This said, I'll continue thinking about other solutions. To be clear : my primary goal is reducing the workload. |
A potentiel con for putting the structure in Are we ready to share access to |
The very point of Maybe we are aiming to high with getting the list of columns outside xscen. Only the folder schema could be enough. We'd still have an example one in xscen. |
I thought about it and my preferred solution for the moment is quite status quo-like.
(Moving to gitlab will allow simple "wget" requests to get the file from within the Ouranos network. You can't do that with a private github repo (needs authentication). ) And because I can't stop, another argument I thought of this morning. In the future, when we will want to upgrade code relating to the common tooling of xscen and miranda, chances are that the first implementation will happen in |
Sorry for the delay in this discussion. I don't know enough about I'm specifically thinking of
|
I like the idea, but I see some caveats that might influence the choice: I'm not sure
would become:
with this approach we would be able to make many dependencies optional, the same way miranda does. Note that this would make most sense in a non-conda venv. AFAIK, All that said, I realized yesterday that as long as "frequency" is the field we use in the structure of monthly/daily/hourly paths, |
Innovative solution, but I kind of hate it, haha. I would rather see useful functions migrate to another library than rely on a pseudo-offical conda-forge package hack. Here's an alternative: We make a third helper library with all the In from helper_func.io import *
from helper_func.io import __all__ as __all_io__
__all__ = __all_io__ I just performed a cursory review of the dependencies in miranda and there's likely more things that can be safely removed. Having a common package that doesn't move once we have a common set of functions is much easier to manage. For common configurations, I want to propose an approach in spirograph and see whether we can port it to other libraries as well. |
that makes it way harder to read. This is what the beginning of ESPO-G is like right now, because the imports were not working when I wrote it. It caused a lot of confusion for Marco. |
Trevor's solution would prevent that. Given that (Trevor also tells me that 2. is possible) |
Doesn't |
Maybe not. It'll be discussed in 2 weeks during an |
I get anxiety when thinking of adding another package to our workload... I can already barely keep up with xscen and xclim, and I have difficulties to give miranda the attention it deserves. I personally prefer the status quo to this. |
I don't see that library as changing too often; it would be pretty self-contained and stable.
|
The beauty of this is that Miranda can continue to exist as a library that is built for one or two things (optionally, 3). The functions that relate to xscen that you'll be focusing on won't exist in Miranda any more (and I would be proposing enhancements that would affect both).
Even if it does move, the functions that xscen and miranda (and xhydro) require would not. If we have very clean call signatures and returns on functions, then they won't change. New functions can be added to the third party library without impacting existing functions.
We make use of |
Sorry everybody, I am again advocating for the integration of xscen into miranda. This came while working on the new schema...
Here are my arguments.
Both packages need to be able to manage catalogues of datasets, parse directories and build directories. They both must share the same column definitions for our work to work. Why split the definitions ? Currently, the list of available columns is defined n both places. While miranda has an automatic validator, xscen has a human-readable documentation of them. Both have mappings to translate common frequency names unto standardized vocab, while only xscen has mappings to the "xrfreq" codes.
They both have a pretty large "common scope". Miranda takes raw data from different sources and creates nicely formatted files. In theory, xscen takes up from that point. However, in practice most operations miranda does once the files are opened kinda falls in xscen's scope too.
Earlier, an very good argument against was that xscen was private and internal. It isn't anymore as it's now on anaconda and pypi. And work is already been done to add testing, our criterion before moving xscen to
conda-forge
.Another good argument was that the dependencies were too different, installing both would result in many useless packages for the other. Since a few months back, miranda now uses
xESMF
. I installedxscen
from anaconda in an empty env and looked at which dependencies it pulled. The packages that xscen pulled that are not in miranda's env (fromenvironment.yml
) are:where
libarchive
,lzo
andpython-tzdata
are low dependencies that were not in miranda's env because of older packages versions forgdal
andpandas
.This means the real extras are
cartopy
,flox
andrechunker
. The first is there because of the temporary solution to compute grid cell bounds on rotated pole data, waiting for a fix incf-xarray
, maybe miranda is interested in this function ? I would argue thatflox
should be added to miranda's env for performance improvements. And I'm pretty sure miranda could make use ofrechunker
.The text was updated successfully, but these errors were encountered: