Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO2 vs Temperature #44

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

NO2 vs Temperature #44

wants to merge 4 commits into from

Conversation

Pratichhya
Copy link
Contributor

This is a follow-up notebook of the "NO2 emission and COVID Lockdown effects" notebook, where we explored the NO2 concentration during and post COVID lockdowns. During the finding, we noticed that there was a distinct pattern of rise and fall in the concentration with the change in temperature.

Hence, in this notebook, we want to explore the correlation between the concentration of NO2 caused by the change in temperature. Therefore, in this use case, we are using the NO2 layer of Sentinel-5P and the Land Surface Temperature(LST) layer of Sentinel-3 SLSTR.

@Pratichhya Pratichhya requested a review from soxofaan January 22, 2024 13:19
@jdries
Copy link
Contributor

jdries commented Jan 22, 2024

@Pratichhya this collection SENTINEL3_SLSTR_L2_LST is not yet live, can you let us know where you found it?

@Pratichhya
Copy link
Contributor Author

Pratichhya commented Jan 22, 2024

@jdries , ahh, I didn't realise that; I followed your comments in the teams. I usually start with the staging endpoint, and once everything works, I then migrate to the main. I tested it this morning in production, and it was fine, too, so I went forward with this.

Should I pause my use cases for those, including LST?

@jdries
Copy link
Contributor

jdries commented Jan 22, 2024

we indeed won't be able to release this one before we make the collection official, can still take a few weeks!

@Pratichhya
Copy link
Contributor Author

Ahh, sure, in that case I can keep this and another usecase using LST on hold. 😄

@Pratichhya Pratichhya marked this pull request as draft January 22, 2024 13:39
@soxofaan
Copy link
Contributor

not sure if related, but there was a user on openeo platform forum also asking about SENTINEL3_SLSTR_L2_LST: https://discuss.eodc.eu/t/mosaic-artifacts-for-sentinel-3-composites/676

@Pratichhya
Copy link
Contributor Author

Oh haha no it's not related. And now I get why Jeroen asked for where I got it from.

@Pratichhya Pratichhya marked this pull request as ready for review March 22, 2024 14:55
@Pratichhya
Copy link
Contributor Author

Sorry, I lost track of this PR, I think this is also ready for review

@soxofaan
Copy link
Contributor

some notes

  • 1 Prepare a NO2 datacube
    • you define aoi as a GeoJSON-style Polygon, but you only use it as spatial_extent in load_collection. I would just use a classic bounding box dict, which is less verbose and a lot easier to adapt and play with
  • 2 Prepare temperature datacube
    • # CREATE MASK DATACUBE these kind of comments don't really add anything I think. In general as a rule of thumb, code comments in notebooks are often an anti-pattern (not always of course): if you want to explain something, use a markdown cell, which are a richer and more user-friendly medium.
    • mask = mask >= 16384 I think that 16384 need some explanation (what does it mean, where does it come from, ..)
    • Typically, data is stored in a 16-bit format to prevent it from occupying large space by 32-bit floating points. Therefore, we convert our LST value saved as kelvin to celsius. This is a very confusing statement: what has Kelvin-to-Celsius to do with data bit depth? And what is the goal of the operation here: avoid 32b floating point or force it? I guess the point is that by subtracting a float 273.15 you convert from 16bit int to (32b?) float. But that is far from clear to the inexperienced reader I'm afraid.
    • about the resampling: I think it is not necessary to explicitly resample if you do a merge_cubes anyway (which will implicitly resamlpe). Unless the bilinear method is important
  • 4 Download the merged cube
    • it's noteworthy that the Incurred Costs amount to only seven openEO Credits is it necessary to mention a concrete cost here? I think it heavily depends on the spatiotemporal extent, and costs might change when we optimize things in the longer term. I would not mention a concrete cost, but just explain how to find it if that is necessary.
  • 6 Let's do the plotting and further analysis
    • The plot above suggests a potential... and The curve further suggests a potential relationsh.. why are these paragraphs in another font size and bold?
    • ax2.set_ylabel("LST(K)") I think the plotted temperatures are in Celcius, not Kelvin
    • dataset.NO2.mean(dim="x", skipna=True).mean(dim="y", skipna=True) mathematically speaking, taking the mean of means is not the same as the mean over x and y combined because of skipna=True, I think you can take the mean over multiple dimensions at the same time in xarray
    • plt.subplots(2, 2, dpi=200, figsize=(8, 6)) try to stick around dpi=90 for notebooks that are just meant to be consumed on computer screens, otherwise font sizes (and file sizes) increase unnecessarily
    • about using plot titles "DJF", "MAM", ... why not use self-describing titles like "Dec-Jan-Feb (DJF)", "Mar-Apr-May (MAM"), ...?

@jdries
Copy link
Contributor

jdries commented Mar 28, 2024

@soxofaan I asked to start including costs in samples, because this is a question that many people have. People want to develop a sense of cost, without having to run multiple workflows themselves first.
It's indeed depending on the area, so the statement can perhaps be revised to better reflect that.
Same for variability, can be mentioned, but it doesn't matter that much in the sense that even a cost of 14 credits would still be pretty low. It's about giving a sense of magnitude rather than exact numbers.

@soxofaan
Copy link
Contributor

Thanks for that background.
I think it's better to do that programmatically in the notebook, instead of hardcoding something like that in markdown text.

For example:

You can inspect the incurred costs of this job within the web editor ... or like this:

job.describe()["costs"]

Note that just rendering the job object in a notebook (so a code cell with just job in it) also shows things like incurred costs, without need to point to the web editor. But I think this rendering is only visible when actively running the notebook yourself, it is lost in a readonly viewers like on github. Using job.describe()["costs"] is probably the most robust

@soxofaan
Copy link
Contributor

if this is an important aspect for demo's and sample notebooks: it should not be too hard to provide a method like job.cost_overview() to get a short listing of costs (automatically with correct currency) and other usage stats like memory/cpu

@Pratichhya
Copy link
Contributor Author

  • 1 Prepare a NO2 datacube

    • you define aoi as a GeoJSON-style Polygon, but you only use it as spatial_extent in load_collection. I would just use a classic bounding box dict, which is less verbose and a lot easier to adapt and play with

updated as bbox

  • mask = mask >= 16384 I think that 16384 need some explanation (what does it mean, where does it come from, ..)

At this moment, I know that it is a kind of cloud code or flag code. But not sure if it has an in-depth explanation for this. I will cross-check with Caroline for more information once she is back.

  • Typically, data is stored in a 16-bit format to prevent it from occupying large space by 32-bit floating points. Therefore, we convert our LST value saved as kelvin to celsius. This is a very confusing statement: what has Kelvin-to-Celsius to do with data bit depth? And what is the goal of the operation here: avoid 32b floating point or force it? I guess the point is that by subtracting a float 273.15 you convert from 16bit int to (32b?) float. But that is far from clear to the inexperienced reader I'm afraid.

Updated

  • about the resampling: I think it is not necessary to explicitly resample if you do a merge_cubes anyway (which will implicitly resamlpe). Unless the bilinear method is important

No, bilinear is not important in this case. Thank you for suggestion, it seem much easier.

  • 4 Download the merged cube

    • it's noteworthy that the Incurred Costs amount to only seven openEO Credits is it necessary to mention a concrete cost here? I think it heavily depends on the spatiotemporal extent, and costs might change when we optimize things in the longer term. I would not mention a concrete cost, but just explain how to find it if that is necessary.

Updated the sentence and also included the code to get the cost as a better option.

  • ax2.set_ylabel("LST(K)") I think the plotted temperatures are in Celcius, not Kelvin

updated

  • dataset.NO2.mean(dim="x", skipna=True).mean(dim="y", skipna=True) mathematically speaking, taking the mean of means is not the same as the mean over x and y combined because of skipna=True, I think you can take the mean over multiple dimensions at the same time in xarray
  • plt.subplots(2, 2, dpi=200, figsize=(8, 6)) try to stick around dpi=90 for notebooks that are just meant to be consumed on computer screens, otherwise font sizes (and file sizes) increase unnecessarily
  • about using plot titles "DJF", "MAM", ... why not use self-describing titles like "Dec-Jan-Feb (DJF)", "Mar-Apr-May (MAM"), ...?

updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants