Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample file for ukbb #1

Open
yoonsucho opened this issue Nov 10, 2022 · 0 comments
Open

Sample file for ukbb #1

yoonsucho opened this issue Nov 10, 2022 · 0 comments

Comments

@yoonsucho
Copy link
Collaborator

yoonsucho commented Nov 10, 2022

Making bgen files for sub-populations using qctools:

# Loading qctools
module add apps/qctool/2.2.0
# Generate subset for CHR 1 (test)
qctool \
-g /mnt/storage/private/mrcieu/data/ukbiobank/genetic/variants/arrays/imputed/released/2018-09-18/data/raw_downloaded/bgen/ukb_imp_chr1_v3.bgen \
-s /mnt/storage/private/mrcieu/data/ukbiobank/genetic/variants/arrays/imputed/released/2018-09-18/data/sample-files/data.chr01.sample \
-incl-samples /user/home/yc16575/scratch/UKBB_pops/IDs_each_continent/UKB_IDs_80Perc_AFR_Anc.txt \
-og /mnt/storage/scratch/yc16575/bp_mr_drug/geno/ukb_imp_chr1_v3_afr.bgen

This returned the following error message:

!! ERROR (genfile::MalformedInputError): the sample file "/mnt/storage/private/mrcieu/data/ukbiobank/genetic/variants/arrays/imputed/released/2018-09-18/data/sample-files/data.chr01.sample" is malformed on line 17, column 1.  Quitting.

Error (genfile::MalformedInputError): Source "/mnt/storage/private/mrcieu/data/ukbiobank/genetic/variants/arrays/imputed/released/2018-09-18/data/sample-files/data.chr01.sample" is malformed on line 17, column 1..

where the data.chr01.sample looks like:

# Analysis: "qctool analysis"
#  started: 2018-07-11 13:49:04
# 
# Analysis properties:
#   -bgen-bits 8 (user-supplied)
#   -bgen-compression zlib (user-supplied)
#   -excl-samples /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//derived/filte
red/snp-stats-europeans//individual_exclusions.txt (user-supplied)
#   -g /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//derived/filtered/snp_fil
tered_bgen//data.chr1.bgen (user-supplied)
#   -og /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//dosage_bgen/data.chr1.bgen (user-supplied)
#   -osample /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//sample-files/data.1.sample (user-supplied)
#   -osnp /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//snp-stats/data.1.snp-stats (user-supplied)
#   -s /mnt/storage/private/mrcieu/research/UKBIOBANK_Array_Genotypes_500k_HRC_Imputation_March2018/data//id_mapping/data.chr1-22.sample (user-supplied)
#   -sample-stats  (user-supplied)
#   -snp-stats  (user-supplied)
# 
sample  index   missing_proportion      missing_call_proportion heterozygous_proportion heterozygous_call_proportion
IEU6189432      0       0       0.0184638       0.195346        0.190014
IEU3007655      1       0       0.0214294       0.206779        0.200754
IEU3987068      2       0       0.0182489       0.208061        0.202874

So I removed all lines start with # using the following codes

sed '/^#/d' /mnt/storage/private/mrcieu/data/ukbiobank/genetic/variants/arrays/imputed/released/2018-09-18/data/sample-files/data.chr01.sample > /mnt/storage/scratch/yc16575/bp_mr_drug/geno/sample/data.chr01.sample

Then inserted "Column type line" referencing the file format instruction for qctools.

Now the sample file for CHR 1 looks like:

sample  index   missing_proportion      missing_call_proportion heterozygous_proportion heterozygous_call_proportion
0       C       C       C       C       C
IEU6189432      0       0       0.0184638       0.195346        0.190014
IEU3007655      1       0       0.0214294       0.206779        0.200754
IEU3987068      2       0       0.0182489       0.208061        0.202874
IEU7396518      3       0       0.0179826       0.197685        0.192267
IEU8375037      4       0       0.0189953       0.196003        0.190471

Then qctool would return this error:

Welcome to qctool
(version: 2.2.0, revision: unknown)

(C) 2009-2020 University of Oxford

Opening genotype files                                      : [******************************] (1/1,-7.8s,-0.1/s)
terminate called after throwing an instance of 'genfile::MismatchError'
  what():  genfile::MismatchError
Aborted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant