Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit for sinv needs to be increased #579

Open
SudhirNadiga-NOAA opened this issue Mar 19, 2024 · 21 comments
Open

Limit for sinv needs to be increased #579

SudhirNadiga-NOAA opened this issue Mar 19, 2024 · 21 comments

Comments

@SudhirNadiga-NOAA
Copy link

When running sinv on a large tank, I get an error message.

[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$ ls -l /lfs/h1/ops/para/dcom/20240318/b021/xx054
-rw-rw-r-- 1 ops.para para 2073922960 Mar 19 01:16 /lfs/h1/ops/para/dcom/20240318/b021/xx054
[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$

ERROR MESSAGE BELOW
[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$ sinv /lfs/h1/ops/para/dcom/20240318/b021/xx054
+++++++++++++++++++++WARNING+++++++++++++++++++++++
BUFRLIB: UFBTAB - THE NO. OF DATA SUBSETS IN THE BUFR FILE IS .GT. LIMIT OF 16000000 IN THE 4TH ARG. (INPUT) - INCOMPLETE READ

UFBTAB STORED 15999291 REPORTS OUT OF ********<<<
+++++++++++++++++++++WARNING+++++++++++++++++++++++

+++++++++++++++++++++WARNING+++++++++++++++++++++++
BUFRLIB: UFBTAB - THE NO. OF DATA SUBSETS IN THE BUFR FILE IS .GT. LIMIT OF 16000000 IN THE 4TH ARG. (INPUT) - INCOMPLETE READ

UFBTAB STORED 15999291 REPORTS OUT OF ********<<<
+++++++++++++++++++++WARNING+++++++++++++++++++++++

209 NOAA 18 9109137 000
223 NOAA 19 6890154 000

                     15999291

[clogin07 /lfs/h2/emc/obsproc/noscrub/sudhir.nadiga]$

How do we address this issue? Thanks.

@jbathegit
Copy link
Collaborator

This is a parameter setting in the sinv utility, based on the expected maximum number of data subsets that one would ever expect to read from a single BUFR file. We could certainly set it to a larger number, but since we're already at 16 million, and since that number is used to dimension two underlying real*8 arrays, then at some point we could conceivably reach a limit where the resulting compiled object is too big to load into RAM. So we may also need to modify the utility to redefine the underlying said and siid arrays as allocatable and dynamically allocate them at run time, rather than fixing their sizes at compile time.

Either way, we'd need to set some practical limit in the utility. @SudhirNadiga-NOAA do you have any idea how much larger you'd need this setting to be? Note that you can get the count of subsets in any BUFR file by just calling ufbtab with a negative logical unit number, and if you do that then you don't need to worry about how big your array is actually dimensioned because it won't actually try to read and store any of the requested mnemonics in the last argument.

Also CCing @jack-woollen for his awareness :-)

@jbathegit
Copy link
Collaborator

FWIW, I just did this for the file you mentioned above, and it had just under 110 million subsets in it! Do you know if that's a typical daily count for these files?

@SudhirNadiga-NOAA
Copy link
Author

Thanks for all your efforts. I will need to ask Iliana if she knows how big our tanks can get in the next year or so. My guess is that the biggest tanks we have are the b021/xx206 tanks.
In the past, the poes-sst tanks used to be our biggest tanks, but these CrIS 431 tanks are huge, and the CrIS 2211 tanks (only made by development) are even bigger. I don't know if that fully answers your question, but I can ask Iliana if she has a better answer.
[clogin06 /lfs/h1/ops/prod/dcom/20240319]$ ls -lrt b021/*
-rw-rw-r-- 1 dfprod prod 263835224 Mar 20 00:05 b021/xx042
-rw-rw-r-- 1 dfprod prod 349471088 Mar 20 00:12 b021/xx213
-rw-rw-r-- 1 ops.prod prod 413967832 Mar 20 00:21 b021/xx246
-rw-rw-r-- 1 dfprod prod 1621221504 Mar 20 00:32 b021/xx039
-rw-rw-r-- 1 ops.prod prod 603775608 Mar 20 00:35 b021/xx046
-rw-rw-r-- 1 ops.prod prod 1951418592 Mar 20 00:36 b021/xx045
-rw-rw-r-- 1 dfprod prod 55290464 Mar 20 00:37 b021/xx044
-rw-rw-r-- 1 ops.prod prod 1698900976 Mar 20 01:04 b021/xx248
-rw-rw-r-- 1 dfprod prod 1346407584 Mar 20 01:13 b021/xx239
-rw-rw-r-- 1 dfprod prod 235666528 Mar 20 01:13 b021/xx036
-rw-rw-r-- 1 dfprod prod 66225472 Mar 20 01:13 b021/xx033
-rw-rw-r-- 1 ops.prod prod 947068136 Mar 20 01:16 b021/xx053
-rw-rw-r-- 1 ops.prod prod 1976323824 Mar 20 01:16 b021/xx054
-rw-rw-r-- 1 ops.prod prod 62270376 Mar 20 01:17 b021/xx028
-rw-rw-r-- 1 dfprod prod 140132320 Mar 20 01:19 b021/xx035
-rw-rw-r-- 1 ops.prod prod 5941454184 Mar 20 01:34 b021/xx206
-rw-rw-r-- 1 ops.prod prod 828036152 Mar 20 01:51 b021/xx051
-rw-rw-r-- 1 ops.prod prod 1971780304 Mar 20 01:52 b021/xx052
-rw-rw-r-- 1 ops.prod prod 3153255264 Mar 20 01:53 b021/xx241
-rw-rw-r-- 1 ops.prod prod 350048312 Mar 20 01:55 b021/xx201
-rw-rw-r-- 1 ops.prod prod 56303488 Mar 20 01:59 b021/xx023
-rw-rw-r-- 1 ops.prod prod 56355784 Mar 20 01:59 b021/xx123
-rw-rw-r-- 1 ops.prod prod 185883912 Mar 20 02:00 b021/xx027
-rw-rw-r-- 1 ops.prod prod 870942000 Mar 20 08:19 b021/xx203
-rw-rw-r-- 1 dfprod prod 307493728 Mar 20 08:48 b021/xx038
-rw-rw-r-- 1 dfprod prod 2338872816 Mar 20 12:14 b021/xx212
[clogin06 /lfs/h1/ops/prod/dcom/20240319]$ uftab b021/xx042

@SudhirNadiga-NOAA
Copy link
Author

@ilianagenkova Do we have an idea as to the maximum desired for sinv in terms of number of subsets?

@SudhirNadiga-NOAA
Copy link
Author

It looks like the xx054 tank may have more subsets by a lot (~18X )
[clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$ binv xx206

type messages subsets bytes

NC021206 32409 5832000 1646306317 179.95
TOTAL 32409 5832000 1646306317

[clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$ binv xx054

type messages subsets bytes

NC021054 4943114 104542051 1953387584 21.15
TOTAL 4943114 104542051 1953387584

[clogin07 /lfs/h1/ops/prod/dcom/20240319/b021]$

@SudhirNadiga-NOAA
Copy link
Author

I am running binv on all our tanks, so we should have an idea as to the largest tanks by number of subsets. This will help answer your question.

@SudhirNadiga-NOAA
Copy link
Author

My check shows that the poes_sst files have close to 600 million subsets.

type messages subsets bytes

NC012023 139362 556549305 -990011533 3993.55
TOTAL 139362 556549305 -990011533

Iliana and I will discuss this problem and get back to you on any further details, since we are considering writing the two different instruments in different tanks to reduce the number of subsets and for ease of dumping.

@jbathegit
Copy link
Collaborator

@SudhirNadiga-NOAA @ilianagenkova any updates on this?

@ilianagenkova
Copy link

I'll provide the largest obs counts that we see (per tank) in the next few days.

@jbathegit
Copy link
Collaborator

@SudhirNadiga-NOAA @ilianagenkova any updates? I'm asking b/c I never heard anything more from either of you about this.

Or, if you're no longer interested in pursuing this, then please feel free to close this issue.

@ilianagenkova
Copy link

@jbathegit , the largest tanks at the moment are ~10GB
ls -l /lfs/h1/ops/prod/dcom//20241113/b012/xx023
-rw-rw-r-- 1 ops.prod prod 10711038888 Nov 14 01:59 /lfs/h1/ops/prod/dcom//20241113/b012/xx023

and they contain about ~700-800 millions subsets:

[clogin03 /lfs/h2/emc/obsproc/noscrub/iliana.genkova/TEMPTEST]$ binv /lfs/h1/ops/prod/dcom//20241113/b012/xx023
type messages subsets bytes
NC012023 195735 781677415 2119893694 3993.55
TOTAL 195735 781677415 2119893694

sinv already can't handle these. If it would be too much work to update sinv, we can use the simple code we wrote:
/lfs/h2/emc/obsproc/noscrub/iliana.genkova/Utils/READALL/count_readall.x (on Cactus)

@jack-woollen
Copy link
Contributor

@ilianagenkova The problem with sinv as it is, is it tries to reads all the data into a table first, which can be quicker than parsing each subset separately. I didn't envision multiple hundreds of million subsets in a file when writing it! I could rewrite it (or a lookalike code) to just read through and count the data which doesn't require storing it. That's pretty simple. I'll make a sample to try and see if that works out.

@ilianagenkova
Copy link

Again, only if it's relatively easy lift. I don't know how many ppl use the NCEPLIBS-bufr utilities and if it's justified to update it only for us/obsproc. My count_readall.x is pretty slow but at least it does the job.

@jack-woollen
Copy link
Contributor

@ilianagenkova My reading everything one at a time is no faster than yours I'm sure. There is some tricky things that might work faster, but will take longer to develop. I'll let you know if I find the time to find them! Its an interesting problem for sure.

@jack-woollen
Copy link
Contributor

@ilianagenkova on hera:

image

I'll make one on wcoss2.

@jack-woollen
Copy link
Contributor

@ilianagenkova - the wcoss2 version is here - runt builds sinv.x and runs a timing test - wcoss2 is a little slower than hera.

image

@ilianagenkova
Copy link

@jack-woollen, this is fantastic! Thank you!
I tested the code on wcoss2, it works for me and it runs fast - the new sinv handled a 11GB tank (one of the largest I've seen) in less about a minute. I'll copy the code so I have a copy of it, and will study it when I get a moment.

I will leave it up to you and @jbathegit to decide whether to add the new sinv to the Bufr library utilities as one more utility, or to replace the current sinv. Also, you can do it when it's convenient for you. We can use your executable until then.

@SudhirNadiga-NOAA @SteveStegall-NOAA , you can test it too (Dogwood):
/lfs/h2/emc/global/noscrub/Jack.Woollen/jwfork/newsinv/sinvx
or
/lfs/h2/emc/obsproc/noscrub/iliana.genkova/Utils/JW_newsinv/sinv.x (

@jbathegit
Copy link
Collaborator

@jack-woollen thanks for this, but it looks like whatever changes you made in /lfs/h2/emc/global/noscrub/Jack.Woollen/jwfork/newsinv were to a very old version of the sinv code, rather than to the latest release version in the develop branch of the NCEPLIBS-bufr GitHub repository. For example:

  • your code is still the older F77, whereas the latest release version is now F90
  • your code doesn't have the recent update to allow specification of the master table directory as an optional 2nd call argument
  • and most significantly, your code still has the old block sadline and sidline arrays, which means all of the satellite and instrument meanings are still hardcoded into the sinv source code rather than using getcfmng() to read them from the master tables

Is there some particular reason you made your updates to an older version of the sinv code, rather than to the latest release version in the develop branch? I'm asking b/c I don't see how we can merge any of this into the develop branch in GitHub, unless that was never your intent to begin with(?)

@jack-woollen
Copy link
Contributor

@jbathegit Good point. I never meant to merge this code into develop because of the significant change to ufbtab that is needed along with it. I have to think that through when I have more time. I'll get back to it when I have a chance to work it through.

@ilianagenkova
Copy link

@jack-woollen , are the changes in ufbtab to speeding it up or to handle larger data arrays?

@jack-woollen
Copy link
Contributor

@ilianagenkova The change to ufbtab allows an app to call it repeatedly to extract data progressively from one input file. The current ufbtab is one and done. If it can't read all the data in one call, it errs off. Since the modified sinv is collapsing the data by itemizing counts of satellite types, it doesn't need to store all the data at one time. The purpose of ufbtab has always been to turbo read simple elements from complicated files. That doesn't change. But the modified version defines output arguments differently. Probably I will introduce a new version of ufbtab with a new name to preserve the current interface of ufbtab. When I get a couple more days to work on it, I'll merge the new stuff into develop. Other things are calling at the moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants