-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated detection for use of legally restricted codes #1961
Comments
From @Jongmassey
|
rough prototype import csv
from collections import defaultdict
from pathlib import Path
description_file = next(
Path("Full/Terminology/").glob("sct2_Description_UKCRFull*.txt")
)
exclusion_refset_pattern = "General practice summary data sharing exclusion"
with description_file.open("r") as f:
reader = csv.DictReader(f, delimiter="\t")
exclusion_refset_concepts = {
r["conceptId"]: r["term"]
for r in reader
if exclusion_refset_pattern in r["term"]
}
excluded_concepts = defaultdict(list)
content_file = next(
Path("Full/Refset/Content/").glob("der2_Refset_SimpleUKCRFull*.txt")
)
with content_file.open("r") as f:
reader = csv.DictReader(f, delimiter="\t")
for r in reader:
for exclusion_conceptId in exclusion_refset_concepts:
if r["refsetId"] == exclusion_conceptId:
excluded_concepts[exclusion_conceptId].append(
{"conceptId": r["referencedComponentId"]}
)
for conceptId, term in exclusion_refset_concepts.items():
with open(f"{conceptId}_{term.replace(' ','-')}.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=["conceptId"])
writer.writeheader()
writer.writerows(excluded_concepts[conceptId]) |
using the |
See also #1564 |
There are approximately 6 legally restricted code groups that cannot be returned in OpenSAFELY data (referenced in the DPIA), e.g. for termination of pregnancy.
However, this is not yet well documented for users, and it's easy to create codelists that contain these codes and run a query, without any warning that some codes will not be matched against any results. These might produce a surprise zero-matches result that is noticed but will usually fail silently, i.e. produce an incomplete result that can go unnoticed.
If an application clearly depends on the use of these codes it will be picked up at that stage, but it several groups have tried to use these codes as a part of a wider study without realising they are restricted.
Possible solutions:
Notes
For all of the solutions, the restricted codelists that we create on OC will require regular (automated) checking to make sure they're kept up to date.
The text was updated successfully, but these errors were encountered: