Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cubi-tk snappy pull-sheets has undocumented samplehseet requirements #164

Open
Nicolai-vKuegelgen opened this issue Apr 24, 2023 · 2 comments

Comments

@Nicolai-vKuegelgen
Copy link
Contributor

Is your feature request related to a problem? Please describe.

When trying to use snappy & cubi-tk snappy pull-sheets functions for projects not related to medical genetics other samplesheet templates can be more useful. However, it is not clearly documented which characteristics of the templates are actually necessary for cubi-tk snappy pull-sheets (and by extension snappy) to run without errors.

By trial and error I have now discovered the following constraints

  • The 'Library Name' column in Sodar needs to contain a substring like '-N1-DNA1-WGS1' that identifies the library type. This reqiurement may be exclusive to the cubi-tk call and not actually affect snappy.
  • The following source 'Characteristics' columns need to be present and - at least in partially - filled with specific non-empty values:
    • Disease Status (allowed: "affected", "carrier", "unaffected", "unknown")
    • Sex (allowed: "female", "male", "unknown")
    • Mother
    • Father
    • Batch

Describe the solution you'd like

Ideally a more flexible template that only requires the minimum information required for snappy to run should be added.

Describe alternatives you've considered

At a minimum these requirements should be documented such that they can be easily looked up.

Additional context
Snappy currently can't handle dashes in Source names: snappy-391
If this can not be fixed easily on the snappy side, cubi-tk snappy pull-sheets could give a warning in these cases.

@Nicolai-vKuegelgen Nicolai-vKuegelgen changed the title cubi-tk snappy pull-sheets has undocumented requirements cubi-tk snappy pull-sheets has undocumented samplehseet requirements Apr 24, 2023
@sellth
Copy link
Contributor

sellth commented Aug 24, 2023

Only Sex and Disease Status are required columns as they are not filled with default values in case of a no-show:

row = [
source.family or "FAM",
source.source_name or ".",
source.father or "0",
source.mother or "0",
MAPPING_SEX[source.sex.lower()],
MAPPING_STATUS[source.affected.lower()],
sample.library_type or "." if sample else ".",
sample.folder_name or "." if sample else ".",
"0" if source.batch_no is None else source.batch_no,
".",
str(project_uuid),
sample.seq_platform or "." if sample else ".",
sample.library_kit or "." if sample else ".",
]

Easy fix would be to also set or "." / "0" for these values.

Also: Duplicate #120

@mbenary
Copy link

mbenary commented Mar 13, 2024

The column source.affected is only relevant for germline and thus does not exist in the somatic case. And cubi-tk snappy pull-sheets fails in line 310.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants