-
Notifications
You must be signed in to change notification settings - Fork 17
Convert SnpPositions.tsv to SnpGcCorrections.tsv
Keiran Raine edited this page Jul 22, 2016
·
2 revisions
Calculates the GC fraction for various bin sizes around the loci. This takes several hours to run depending on the number of SNPs.
$ ascatSnpPanelGcCorrections.pl genome.fa SnpPositions.tsv > SnpGcCorrections.tsv
You can split up the input file and make the job parallel with something along these lines:
$ mkdir splitPos splitGc splitGcLogs
$ split --number=l/10 -d SnpPositions.tsv splitPos/snpPos.
(Note: if you increase --number=l/10
above 10 then you will need to set --suffix-length
appropriately)
Then auto generate the commands:
$ ls -1 splitPos/ | xargs -I {} echo '(ascatSnpPanelGcCorrections.pl genome.fa splitPos/{} > splitGc/{}) >& splitGcLogs/{}.log &'
Execute the resulting commands.
Once all are complete you need to stitch the files back together:
head -n 1 splitGc/snpPos.00 > SnpGcCorrections.tsv
cat splitGc/snpPos.* | grep -vP 'Chr\tPosition' >> SnpGcCorrections.tsv