-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Joseph Kuo
committed
Feb 21, 2024
1 parent
22817f7
commit 9c92502
Showing
4 changed files
with
82 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,10 @@ | ||
============== | ||
Usage Examples | ||
============== | ||
========= | ||
Tutorials | ||
========= | ||
|
||
Here is a list of some possible usage cases with GenomKit. | ||
Here is a list of some tutorials for GenomKit in handling various bioinformatic tasks. | ||
|
||
.. toctree:: | ||
examples_fastq | ||
examples_bed | ||
examples_bed | ||
examples_gtf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
======================== | ||
Starting with a GTF file | ||
======================== | ||
|
||
Because ``GAnnotation`` is able to handle both ``GTF`` and ``GFF``, you can replace the GTF file in the tutorials below with the GFF file. Here we show only GTFs as examples. | ||
|
||
Extract exon, intron, and intergenetic regions in BED format from a GTF file | ||
---------------------------------------------------------------------------- | ||
|
||
``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below: | ||
|
||
- hg38_exons.bed | ||
- hg38_introns.bed | ||
- hg38_intergenic_regions.bed | ||
|
||
.. code-block:: python | ||
from genomkit import GRegions | ||
from genomkit import GAnnotation | ||
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf") | ||
genes = gtf.get_regions(element_type="gene") | ||
exons = gtf.get_regions(element_type="exon") | ||
introns = genes.subtract(exons, inplace=False) | ||
chromosomes = GRegions(name="chromosomes") | ||
chromosomes.get_chromosomes(organism="hg38") | ||
intergenic_regions = chromosomes.subtract(genes, inplace=False) | ||
exons.write(filename="hg38_exons.bed") | ||
introns.write(filename="hg38_introns.bed") | ||
intergenic_regions.write(filename="hg38_intergenic_regions.bed") | ||
Get all promoter regions in BED format from a GTF file | ||
------------------------------------------------------ | ||
|
||
``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below: | ||
|
||
.. code-block:: python | ||
from genomkit import GAnnotation | ||
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf") | ||
genes = gtf.get_regions(element_type="gene") | ||
promoters = genes.resize(extend_upstream=2000, | ||
extend_downstream=0, | ||
center="5prime", inplace=False) | ||
promoters.write(filename="hg38_promoters.bed") | ||
Extract the genes by their biotypes from a GTF file | ||
--------------------------------------------------- | ||
|
||
``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate BED files for the biotypes below: | ||
|
||
- protein_coding | ||
- lncRNA | ||
- snRNA | ||
- miRNA | ||
|
||
.. code-block:: python | ||
from genomkit import GAnnotation | ||
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf") | ||
target_biotypes = ["protein_coding", "lncRNA", "snRNA", "miRNA"] | ||
for biotype in target_biotypes: | ||
genes = gtf.get_regions(element_type="gene", | ||
attribute="gene_type", value=biotype) | ||
genes.write(filename="hg38_genes_"+biotype+".bed") | ||