Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last version of Updated_Catalog might discard too many stations #8

Open
tomsail opened this issue Oct 16, 2023 · 2 comments
Open

Last version of Updated_Catalog might discard too many stations #8

tomsail opened this issue Oct 16, 2023 · 2 comments

Comments

@tomsail
Copy link
Collaborator

tomsail commented Oct 16, 2023

There is a problem in discarding stations that have same set of lat/lon or too close to each other.
Some - not exhaustive - reasons

  1. some stations may have had multiple deployments over time, from the same or different providers and are identified with different names
  2. some might have different ioc_codes but be the same station with different instruments (ex: sete and sete2).

https://github.com/oceanmodeling/seaset/blob/main/Notebooks/Updated_Catalog.ipynb

I suggest to:

  • export a csv version of the table when obtaining the grouped_df dataframe --> for data retrieval purposes
  • export a csv version at the end of the notebook like already implemented --> For modelling purposes
@tomsail
Copy link
Collaborator Author

tomsail commented Apr 25, 2024

Update on this: the same problem was seen when analyzing results from models and observed data from ioc_cleanup (for prin and prin2 stations)

Actions suggested:

  • Drop the PR I initiated and restart from the official repo
  • Provide a function create_seaset() to initiate a first version of the seaset catalog, without dropping any ioc_code. This function would have to be run only once to generate a reference catalog.
  • Provide a functionupdate_seaset() to add new stations and attribute them a new unique provider, if not already present in the database.
  • Provide a function id, dist = get_seaset_id(lat,lon) that return the N unique seaset ids for the coords entered and vice versa lat, lon = get_coords(id) that return coordinates for the seaset id returned.
  • Order notebooks, scripts and functions in the correct directories

@pmav99
Copy link
Member

pmav99 commented Apr 25, 2024

Assuming that seaset is a GeoDataFrame (or at least a DataFrame with lon/lat columns), then we should think carefully which functions are really needed, because some of them are trivial to implement. For example get_coords(id) is just seaset[seaset.id==id].geometry or similar,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants