We proposed a modified version of our previous HOR annotation tool HiCAT for automatically annotating centromere HOR patterns from both HiFi reads and assemblies of multiple human samples.
Python 3.9.13
Development environment: Linux
Development tool: Pycharm
Packages | Version |
---|---|
biopython | 1.79 |
joblib | 1.1.0 |
lastz | 1.04.22 |
matplotlib | 3.5.1 |
numpy | 1.22.3 |
pandas | 1.4.0 |
python-edlib | 1.3.9 |
python-levenshtein | 0.12.2 |
scikit-learn | 1.0.2 |
seqtk | 1.2 |
setuptools | 61.2.0 |
StringDecomposer version 1.1.2.
#install
conda install -y --file requirements.txt
cd ./stringdecomposer && make
python HiCAT_human.py only_reads -r INPUT_READS_DIR -rs READS_SAMPLE_FILE -o READS_OUTPUT_DIR -th THREAD
For more details, please use -h
.
-
input_reads_dir
should contain the fasta format read file of all samples and corresponding .fai file.sample1.fasta sample1.fasta.fai sample2.fasta sample2.fasta.fai
-
reads_sample_file
should be a two-column file record all sample names and gender, \t separator.sample1 male sample2 female
-
reads_output_dir
is the output directory. -
th
is number of threads.
Reads with assembly (a combination of reads annotation,reads aggregation, assembly annotation and assembly matching to reads)
HiCAT_human.py reads_with_assembly -r INPUT_READS_DIR -rs READS_SAMPLE_FILE -o READS_OUTPUT_DIR -th THREAD -a INPUT_ASSEMBLY_DIR -as ASSEMBLY_SAMPLE_FILE
For more details, please use -h
.
-
input_reads_dir
should contain the fasta format read file of all samples and corresponding .fai file. -
reads_sample_file
should be a two-column file record all sample names and gender, \t separator. -
reads_output_dir
is the output directory. -
th
is number of threads. -
input_assembly_dir
should contain the fasta format assembly file of all samples and corresponding .fai file.assembly1.fasta assembly1.fasta.fai assembly2.fasta assembly2.fasta.fai
The name of chromosomes in assembly file should start with 'chr'.
>chr1 ACGTACGTACGTCAGATCTACGCATAGTGTGCTA... >chr2 CACAGTGGTGGTGTGGGTTACTACACA...
-
assembly_sample_file
should be a text file record one sample name per line.assembly1 assembly2
If you have any questions, please feel free to contact: [email protected], [email protected], [email protected]
Please cite the following paper when you use HiCAT-human in your work
Shenghan Gao, Yimeng Zhang, Stephen James Bush, Bo Wang, Xiaofei Yang, Kai Ye bioRxiv 2024.01.26.577337; doi: https://doi.org/10.1101/2024.01.26.577337