Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve validation script to try to fail early with meaningful messages #56

Closed
borauyar opened this issue Jul 1, 2019 · 2 comments
Closed
Labels

Comments

@borauyar
Copy link
Member

borauyar commented Jul 1, 2019

  • Check if the settings.yaml is formatted correctly.
  • Check to see if the input GTF file is parseable.
  • Check transcript ids in cDNA file to see if they match transcript_id field in GTF file. Put a warning that the transcript -> gene id mapping won't work for salmon results.
  • Check if chromosome naming conventions agree between the GTF file and the fasta file
    UCSC vs NCBI styles
@alexg9010
Copy link
Member

relating to chromosome naming style checking you could have a look here: https://github.com/BIMSBbioinfo/pigx_chipseq/blob/master/scripts/Check_Config.py#L213-L267

@borauyar
Copy link
Member Author

These commits fixes the issues about the annotation files:
9036ace, 014571f, efdd4f2, a7fb557

  Check if the settings.yaml is formatted correctly.
  Check to see if the input GTF file is parseable.
  Check transcript ids in cDNA file to see if they match transcript_id field in GTF file. Put a warning that the transcript -> gene id mapping won't work for salmon results.
  Check if chromosome naming conventions agree between the GTF file and the fasta file
  UCSC vs NCBI styles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants