Skip to content

Convert SnpPositions.tsv to SnpGcCorrections.tsv

Keiran Raine edited this page Jul 22, 2016 · 2 revisions

Calculates the GC fraction for various bin sizes around the loci. This takes several hours to run depending on the number of SNPs.

$ ascatSnpPanelGcCorrections.pl genome.fa SnpPositions.tsv > SnpGcCorrections.tsv

You can split up the input file and make the job parallel with something along these lines:

$ mkdir splitPos splitGc splitGcLogs
$ split --number=l/10 -d SnpPositions.tsv splitPos/snpPos.

(Note: if you increase --number=l/10 above 10 then you will need to set --suffix-length appropriately)

Then auto generate the commands:

$ ls -1 splitPos/ | xargs -I {} echo '(ascatSnpPanelGcCorrections.pl genome.fa splitPos/{} > splitGc/{}) >& splitGcLogs/{}.log &'

Execute the resulting commands.

Once all are complete you need to stitch the files back together:

head -n 1 splitGc/snpPos.00 > SnpGcCorrections.tsv
cat splitGc/snpPos.* | grep -vP 'Chr\tPosition' >> SnpGcCorrections.tsv