-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NDJSON test data doesn't contain variable names #49
Comments
@mstackhouse Are you aware or able to check with Sam or Lex to confirm the test data for ndjson is valid? |
@nicholas-masel are you talking about the row-level data itself? So for the data records, or for the variable level metadata? Because this is the same case for the non-NDJSON data too: From here {
"datasetJSONCreationDateTime": "2023-06-28T15:38:43",
"datasetJSONVersion": "1.1.0",
"fileOID": "www.sponsor.xyz.org.project123.final",
"dbLastModifiedDateTime": "2023-05-31T00:00:00",
"originator": "Sponsor XYZ",
"sourceSystem": {
"name": "Software ABC",
"version": "1.0.0"
},
"studyOID": "cdisc.com.CDISCPILOT01",
"metaDataVersionOID": "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7",
"metaDataRef": "https://metadata.location.org/CDISCPILOT01/define.xml",
"itemGroupOID": "IG.DM",
"isReferenceData": false,
"records": 18,
"name": "DM",
"label": "Demographics",
"columns": [
{"itemOID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record Identifier", "dataType": "integer"},
{"itemOID": "IT.STUDYID", "name": "STUDYID", "label": "Study Identifier", "dataType": "string", "length": 12, "keySequence": 1},
{"itemOID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Abbreviation", "dataType": "string", "length": 2},
{"itemOID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "dataType": "string", "length": 8, "keySequence": 2},
{"itemOID": "IT.AGE", "name": "AGE", "label": "Age", "dataType": "integer"}
],
"rows": [
[1, "CDISCPILOT01", "DM", "CDISC001", 84],
[2, "CDISCPILOT01", "DM", "CDISC002", 76],
[3, "CDISCPILOT01", "DM", "CDISC003", 61],
...
]
} The only change for NDJSON is that the rows elements are instead there own lines of the file: {
"datasetJSONCreationDateTime": "2023-06-28T15:38:43",
"datasetJSONVersion": "1.1.0",
"fileOID": "www.sponsor.xyz.org.project123.final",
"dbLastModifiedDateTime": "2023-05-31T00:00:00",
"originator": "Sponsor XYZ",
"sourceSystem": {
"name": "Software ABC",
"version": "1.0.0"
},
"studyOID": "cdisc.com.CDISCPILOT01",
"metaDataVersionOID": "MDV.MSGv2.0.SDTMIG.3.3.SDTM.1.7",
"metaDataRef": "https://metadata.location.org/CDISCPILOT01/define.xml",
"itemGroupOID": "IG.DM",
"isReferenceData": false,
"records": 18,
"name": "DM",
"label": "Demographics",
"columns": [
{"itemOID": "ITEMGROUPDATASEQ", "name": "ITEMGROUPDATASEQ", "label": "Record Identifier", "dataType": "integer"},
{"itemOID": "IT.STUDYID", "name": "STUDYID", "label": "Study Identifier", "dataType": "string", "length": 12, "keySequence": 1},
{"itemOID": "IT.DOMAIN", "name": "DOMAIN", "label": "Domain Abbreviation", "dataType": "string", "length": 2},
{"itemOID": "IT.USUBJID", "name": "USUBJID", "label": "Unique Subject Identifier", "dataType": "string", "length": 8, "keySequence": 2},
{"itemOID": "IT.AGE", "name": "AGE", "label": "Age", "dataType": "integer"}
]
}
[1, "CDISCPILOT01", "DM", "CDISC001", 84]
[2, "CDISCPILOT01", "DM", "CDISC002", 76]
[3, "CDISCPILOT01", "DM", "CDISC003", 61]
... |
Yeah, I was talking about variable names on the row-level data. I reached out to Sam and he confirmed this was not included due to file size. I am trying out reading as a list instead of a df, and it seems to work, but is causing some other type issues downstream that didn't appear when reading this directly to a df. yyjsonr::read_ndjson_str(
file,
type = "list",
nskip = 1,
opts = json_opts
) |
The NDJSON data doesn't contain variable names in each row, only values.
For example:
With variables names:
{"name": "Leandro","lastName": "Shokida"} {"name": "Mariano","lastName": "De Achaval"}
Without variable names:
{"Leandro", "Shokida"} {"Mariano", "De Achaval"}
From what I can tell we can:
The text was updated successfully, but these errors were encountered: