Build configuration for CAMES SARS-CoV-2 hosted at https://nextstrain.org/community/AlbertRockG/ncov-cames.
It's up to you to choose your environment. Whichever you choose, feel free to check the documentation of nextstrain to install the required packages.
git clone https://github.com/AlbertRockG/ncov-cames.git
git clone https://github.com/nextstrain/ncov.git
cd ncov
cp -r ../ncov-cames/cames_profile .
cp ../ncov-cames/builds_cames.yaml .
Welcome to the repository for the SARS-CoV-2 genomics epidemiology analysis conducted for the Also-Covid-19 Project in collaboration with the African and Malagasy Council of Higher Education (CAMES).
This project aims to provide comprehensive build configurations for CAMES SARS-CoV-2, hosted at Nextstrain Community Builds. The analysis focuses on understanding the genomic epidemiology of SARS-CoV-2 in African and Malagasy regions.
Choose your preferred environment for running the analysis. Refer to the Nextstrain documentation for installing the necessary packages.
git clone https://github.com/AlbertRockG/ncov-cames.git
git clone https://github.com/nextstrain/ncov.git
cd ncov
cp -r ../ncov-cames/cames_profile .
cp ../ncov-cames/builds_cames.yaml .
Login to GISAID (gisaid.com) and select the "EpiCoV" link from the navigation.
Select the Downloads
link from the EpiCoV navigation bar. Scroll to the Genomic epidemiologiy
section and select the nextregions
button. Select the Africa
button. Save the file as hcov_africa.tar.gz
in the ncov/data/
workflow directory.
Click Back
to return to the main Download
dialog, find the Download packages
section, and select the FASTA
button. Save the full GISAID sequence as data/sequences_fasta.tar.xz
.
Select the metadata
button from that same Download packages
section and download the corresponding file as data/metadata_tsv.tar.xz
.
From an existing nextstrain
conda environment, install extra tools to extract data from GISAID files.
# Install tsvutils and UCSC command to extract sequences.
# You only need to do this once.
conda activate nextstrain
conda install -c conda-forge -c bioconda tsv-utils ucsc-fasomerecords
Extract African metadata and sequences from full GISAID downloads.
# Get metadata for Africa directly from tarball.
tar xOf data/metadata_tsv.tar.xz metadata.tsv \
| tsv-filter -H --str-in-fld Location:Africa \
| xz -c -2 > data/metadata_africa.tsv.xz
# Get strain names for genomes.
# GISAID uses virus name, collection date, and submission date.
# delimited by a pipe character.
xz -c -d data/metadata_africa.tsv.xz \
| tsv-select -H -f 'Virus\ name','Collection\ date','Submission\ date' \
| sed 1d \
| awk -F "\t" '{ print $1"|"$2"|"$3 }' > data/strains_africa.txt
# Get genomes for strain names from tarball.
tar xOf data/sequences_fasta.tar.xz sequences.fasta \
| seqkit grep -n -f data/strains_africa.txt /dev/stdout \
| xz -c -2 > data/sequences_africa.fasta.xz
Navigate to the ncov workflow directory; these instructions assume this is a sibling directory to this repository. By defaul, the following command will run builds for all CAMES' countries, CAMES, and African's regions.
nextstrain build \
--cpus 4 \
--memory 8Gib \
. \
--configfile builds_cames.yaml \
--config active_builds=CAMES
To visualize your builds, follow the tutorial under this link to install Auspice web App. Then run from ncov
directory:
nextstrain view auspice
A heartfelt thank you to the PTR-SANTE CAMES Board and Dr Luc Salako Djogbenou for their unwavering support and valuable contributions to this initiative.