Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed dataset with index or primary key confuses dataset reader #79

Open
rvasil opened this issue Apr 27, 2022 · 0 comments
Open

Comments

@rvasil
Copy link

rvasil commented Apr 27, 2022

I get out-of-bounds error when reading SAS dataset with:

  • BINARY compression
  • and either:
    • composite index or
    • primary key defined
  • dataset has at least 3 pages = in SAS log e. g. :
    Compressed is 3 pages; un-compressed would require 4 pages

This is on current SASLib v1.3.1, SAS is 9.4 on Windows.
I didn't find this issue with CHAR compression.
I wonder if have access to SAS and can replicate this.

data work.test_comp_pk(compress=binary);
length rowno 8 text text2 $50;
do rowno = 1 to 2000;
   text = catt("text", put(rowno, z8.));
   text2 = "anything";
   output;
end;
run;

/* either this */
proc sql;
create index ix on work.test_comp_pk (rowno, text)
;
quit;

/* or this */
proc sql;
alter table work.test_comp_pk
add constraint pk primary key (rowno)
;
quit;

When either composite index or primary key is created on this sample dataset, I get error like this:

julia> s = readsas(raw"E:\temp\test_comp_pk.sas7bdat")                                         
ERROR: BoundsError: attempt to access 33-element Vector{UInt8} at index [34]                                 
Stacktrace:                                                                                                  
  [1] getindex                                                                                               
    @ .\array.jl:861 [inlined]                                                                               
  [2] rle_decompress(output_len::Int64, input::Vector{UInt8})                                                
    @ SASLib D:\lang\julia_packages\packages\SASLib\AkuJe\src\SASLib.jl:1424                                 
  [3] process_byte_array_with_data(handler::SASLib.Handler, offset::Int64, length::Int64, compression::UInt8)
    @ SASLib D:\lang\julia_packages\packages\SASLib\AkuJe\src\SASLib.jl:1198                                 
  [4] readline(handler::SASLib.Handler)                                                                      
    @ SASLib D:\lang\julia_packages\packages\SASLib\AkuJe\src\SASLib.jl:1119                                 

The presence of index or primary key is contained in SAS dataset metadata and it seems to confuse the reading logic.

EDIT: originally I reported messed-up results, but this was because I messed-up with the code when trying to find out what's going on there :-( .
Sorry for confusion and thank you for all of the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant