-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OM format decoding problem #20
Comments
Hi, for which programming language are you trying to implement a reader? We are actively working on a direct implementation for various programming languages. The file format will also be revised to support more than 2 dimensions, streaming write support, cloud native reads and further improve compression ratio. Here is the branch to implement new writer/reader. It is not yet functional.
You might be correct. The size is calculated just by using |
I've tried it around a month ago or so using python with no success. When I tried it I had like an hour of 'free' time and after that I went on holidays for a month so I can't remember the details or what was the issue but the data after the header made no sense to me. It will be very useful if someone can provide any hints. |
@patrick-zippenfenig For C#. I thought it would be simple enough, but TurboPFor without a wrapper for C# and my lack of experience with integer compression has become a big problem so far. |
We aim to provide low-level C functions to interact with OM files. This will abstract chunking and compression. Integrations into other programming languages using asynchronous IO should then be "relatively" easy. Here are some additional notes: fsspec/kerchunk#464 |
I'm trying to make my own decoder for OM files, but I get a discrepancy between the estimated number of chunks and what's in the file.
For example, I process the MSM model, temperature (chunk_4209.om file)
header (56 bytes like it's said in documentation):
OM(2): OM
Version(1): 2
Compression(1): 0
ScaleFactor(4): 20
Dim0(8): 242905
Dim1(8): 114
Chunk0(8): 26
Chunk1(8): 114
???(8): 790
???(8): 1469
From this I get that there are 9343 chunks in the file ((114/114) * 242905 / 26 = 9,342.5).
I start reading the array of offsets:
Pos:56 ChunkN:0 Offset:2177
Pos:64 ChunkN:1 Offset:2858
Pos:72 ChunkN:2 Offset:3484
Pos:80 ChunkN:3 Offset:4139
It's alredy no clear why the first offset is so large. As if there are three in this chunk at once.
At the last offsets I get incorrect values, as if the data already begins where the offset values should be:
Pos:74776 ChunkN:9340 Offset:6206997
Pos:74784 ChunkN:9341 Offset:1407379585452674
Pos:74792 ChunkN:9342 Offset:-4611685990509789180
I have a guess that the offsets are actually written from the 40th byte.
Is there some problem with documentation?
The text was updated successfully, but these errors were encountered: