Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Joseph Kuo committed Feb 21, 2024
1 parent 22817f7 commit 9c92502
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 75 deletions.
11 changes: 6 additions & 5 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
==============
Usage Examples
==============
=========
Tutorials
=========

Here is a list of some possible usage cases with GenomKit.
Here is a list of some tutorials for GenomKit in handling various bioinformatic tasks.

.. toctree::
examples_fastq
examples_bed
examples_bed
examples_gtf
70 changes: 3 additions & 67 deletions docs/source/examples_bed.rst
Original file line number Diff line number Diff line change
@@ -1,70 +1,6 @@
=======================
Examples with BED files
=======================

Extract exon, intron, and intergenetic regions in BED format from a GTF file
----------------------------------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:

- hg38_exons.bed
- hg38_introns.bed
- hg38_intergenic_regions.bed

.. code-block:: python
from genomkit import GRegions
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
genes = gtf.get_regions(element_type="gene")
exons = gtf.get_regions(element_type="exon")
introns = genes.subtract(exons, inplace=False)
chromosomes = GRegions(name="chromosomes")
chromosomes.get_chromosomes(organism="hg38")
intergenic_regions = chromosomes.subtract(genes, inplace=False)
exons.write(filename="hg38_exons.bed")
introns.write(filename="hg38_introns.bed")
intergenic_regions.write(filename="hg38_intergenic_regions.bed")
Get all promoter regions in BED format from a GTF file
------------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:

.. code-block:: python
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
genes = gtf.get_regions(element_type="gene")
promoters = genes.resize(extend_upstream=2000,
extend_downstream=0,
center="5prime", inplace=False)
promoters.write(filename="hg38_promoters.bed")
Extract the genes by their biotypes from a GTF file
---------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate BED files for the biotypes below:

- protein_coding
- lncRNA
- snRNA
- miRNA

.. code-block:: python
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
target_biotypes = ["protein_coding", "lncRNA", "snRNA", "miRNA"]
for biotype in target_biotypes:
genes = gtf.get_regions(element_type="gene",
attribute="gene_type", value=biotype)
genes.write(filename="hg38_genes_"+biotype+".bed")
========================
Starting with a BED file
========================

Get the sequences in FASTA format from a BED file
-------------------------------------------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/examples_fastq.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
=========================
Examples with FASTQ files
=========================
=====================
Starting a FASTQ file
=====================

Trimming a FASTQ file for both sequences and quality
----------------------------------------------------
Expand Down
70 changes: 70 additions & 0 deletions docs/source/examples_gtf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
========================
Starting with a GTF file
========================

Because ``GAnnotation`` is able to handle both ``GTF`` and ``GFF``, you can replace the GTF file in the tutorials below with the GFF file. Here we show only GTFs as examples.

Extract exon, intron, and intergenetic regions in BED format from a GTF file
----------------------------------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:

- hg38_exons.bed
- hg38_introns.bed
- hg38_intergenic_regions.bed

.. code-block:: python
from genomkit import GRegions
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
genes = gtf.get_regions(element_type="gene")
exons = gtf.get_regions(element_type="exon")
introns = genes.subtract(exons, inplace=False)
chromosomes = GRegions(name="chromosomes")
chromosomes.get_chromosomes(organism="hg38")
intergenic_regions = chromosomes.subtract(genes, inplace=False)
exons.write(filename="hg38_exons.bed")
introns.write(filename="hg38_introns.bed")
intergenic_regions.write(filename="hg38_intergenic_regions.bed")
Get all promoter regions in BED format from a GTF file
------------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:

.. code-block:: python
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
genes = gtf.get_regions(element_type="gene")
promoters = genes.resize(extend_upstream=2000,
extend_downstream=0,
center="5prime", inplace=False)
promoters.write(filename="hg38_promoters.bed")
Extract the genes by their biotypes from a GTF file
---------------------------------------------------

``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate BED files for the biotypes below:

- protein_coding
- lncRNA
- snRNA
- miRNA

.. code-block:: python
from genomkit import GAnnotation
gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
target_biotypes = ["protein_coding", "lncRNA", "snRNA", "miRNA"]
for biotype in target_biotypes:
genes = gtf.get_regions(element_type="gene",
attribute="gene_type", value=biotype)
genes.write(filename="hg38_genes_"+biotype+".bed")

0 comments on commit 9c92502

Please sign in to comment.