Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Challenge 0 release #7

Closed
zonca opened this issue Mar 10, 2023 · 14 comments
Closed

Data Challenge 0 release #7

zonca opened this issue Mar 10, 2023 · 14 comments
Assignees

Comments

@zonca
Copy link
Member

zonca commented Mar 10, 2023

@rpwagner transferring the data to UCSD. It is 1.4 TB.

Currently eveything is inside a folder named chlat. We do not want to move folders or rename files, so that we keep the same organization that we have at NERSC.

so the structure is:

dc0/chlat/splitXX/FREQ/*.fits

For now we only have Chile Large Aperture Telescope chlat, but we want to keep the folder, because we will add more telescopes.

@rpwagner, could you import the data into the portal using the same scripts you used for NPIPE?
Then I'll customize further the appearance and afterwards we will think if we want to extract more metadata from the headers #5

@rpwagner
Copy link
Collaborator

@zonca I see the data moving now. Once it's done being copied from NERSC, I'll move the chlat folder to /datareleases/dc0/chlat so that each Data Challenge has its own folder. Then I'll create initial pages and metadata as before.

Can you tell me what splitXX refers to so I can make that part of the page organization?

@zonca
Copy link
Member Author

zonca commented Mar 10, 2023

It is a time split.
Split01 is the whole dataset.
Splitxx are interleaved time splits, mostly used for different kind of crosscorrelation

@rpwagner
Copy link
Collaborator

The data is in /datareleases/dc0 and I've generated the manifests for each bottom-level folder. I'm aiming to have the basic Markdown and resulting web pages done sometime tomorrow.

@rpwagner
Copy link
Collaborator

It is a time split.
Split01 is the whole dataset.
Splitxx are interleaved time splits, mostly used for different kind of crosscorrelation

@zonca are the splits time-based? As in split01 is the full year, split02 is 6 months, split04 is 3 months, etc?

@jdborrill
Copy link
Contributor

jdborrill commented Mar 14, 2023

Although that is correct in terms of the total amount of time in each split, the splits themselves are formed by interleaving the individual days. We start by splitting into 32 maps, assigning days round-robin, and then progressively combine them pairwise to 16, 8, 4, 2 and 1.

Also, although we only simulate 1 year of data in this Data Challenge we then re-scale the maps to correspond to a 7-year mission. The full mission (and splits thereof) is what these data correspond to.

@rpwagner
Copy link
Collaborator

Alright, I think I get that. Each map in split32 corresponds to about 80 total days of observation, where those observations are done once every 32 days. split16 will be about 160 total days, etc.

Is there short description of the difference between the *map03.fits and *map02_c111.fits files, like dc0_chlat_t16.01_027_map02_c111.fits and dc0_chlat_t16.01_027_map03.fits ?

I'm assuming that 027, 039, etc., are GHz.

@zonca
Copy link
Member Author

zonca commented Mar 14, 2023

@rpwagner I gave you access to the data schema document that explains those details

@zonca
Copy link
Member Author

zonca commented Mar 14, 2023

About the length of each observation, I think it depends on the scanning strategy, I believe it is of the order of a few hours.

@jdborrill
Copy link
Contributor

jdborrill commented Mar 14, 2023

Conceptually you're right; in practice we don't observe for the full 365 days a year (weather events, outages, etc) so it's a bit less than 80/160/...

The lookup values for things like mapXX, cYYY, etc, are also in the file headers.

@rpwagner
Copy link
Collaborator

rpwagner commented Mar 16, 2023

Thanks for sharing the data schema document.

I've granted read access to /datareleases/dc0/ the Globus Group CMB-S4 Collaborators. Users who should have access to the data release should be invited to this group.

@jdborrill I've sent you an invite to that group. Once you join I'll make you a group manager. @zonca you already are.

@rpwagner
Copy link
Collaborator

There's a basic site for DC0 in PR #8. Adding content to the main data release page (named dc0.md) will probably have the most impact for making the data useful to users.

For data access, we can also start inviting users to the Collaborators group. You may want to make that part of the onboarding and off-boarding processes.

@zonca
Copy link
Member Author

zonca commented Mar 17, 2023

great!
I actually already have a group we can use for the CMB-S4 members, I invited you, could you use that group?

@rpwagner
Copy link
Collaborator

Absolutely! I’ve granted that group read access to the release in the Globus collection. We’ll need to update the links on the page describing the group.

@zonca
Copy link
Member Author

zonca commented Mar 17, 2023

great job @rpwagner!
@jdborrill we have a first version online at https://cmb-s4.github.io/serverless-data-portal-cmb-s4/index.html
we can work on organization and documentation in the next days.

@zonca zonca closed this as completed Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants