update docs

chaochungkuo · Feb 21, 2024 · 9c92502 · 9c92502
1 parent 22817f7
commit 9c92502
Show file tree

Hide file tree

Showing 4 changed files with 82 additions and 75 deletions.
diff --git a/docs/source/examples.rst b/docs/source/examples.rst
@@ -1,9 +1,10 @@
-==============
-Usage Examples
-==============
+=========
+Tutorials
+=========
 
-Here is a list of some possible usage cases with GenomKit.
+Here is a list of some tutorials for GenomKit in handling various bioinformatic tasks.
 
 .. toctree::
    examples_fastq
-   examples_bed
+   examples_bed
+   examples_gtf
diff --git a/docs/source/examples_bed.rst b/docs/source/examples_bed.rst
@@ -1,70 +1,6 @@
-=======================
-Examples with BED files
-=======================
-
-Extract exon, intron, and intergenetic regions in BED format from a GTF file
-----------------------------------------------------------------------------
-
-``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:
-
-- hg38_exons.bed
-- hg38_introns.bed
-- hg38_intergenic_regions.bed
-
-.. code-block:: python
-
-    from genomkit import GRegions
-    from genomkit import GAnnotation
-
-    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
-    genes = gtf.get_regions(element_type="gene")
-    exons = gtf.get_regions(element_type="exon")
-    introns = genes.subtract(exons, inplace=False)
-
-    chromosomes = GRegions(name="chromosomes")
-    chromosomes.get_chromosomes(organism="hg38")
-    intergenic_regions = chromosomes.subtract(genes, inplace=False)
-    exons.write(filename="hg38_exons.bed")
-    introns.write(filename="hg38_introns.bed")
-    intergenic_regions.write(filename="hg38_intergenic_regions.bed")
-
-
-Get all promoter regions in BED format from a GTF file
-------------------------------------------------------
-
-``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:
-
-.. code-block:: python
-
-    from genomkit import GAnnotation
-
-    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
-    genes = gtf.get_regions(element_type="gene")
-    promoters = genes.resize(extend_upstream=2000,
-                            extend_downstream=0,
-                            center="5prime", inplace=False)
-    promoters.write(filename="hg38_promoters.bed")
-
-Extract the genes by their biotypes from a GTF file
----------------------------------------------------
-
-``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate BED files for the biotypes below:
-
-- protein_coding
-- lncRNA
-- snRNA
-- miRNA
-
-.. code-block:: python
-
-    from genomkit import GAnnotation
-
-    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
-    target_biotypes = ["protein_coding", "lncRNA", "snRNA", "miRNA"]
-    for biotype in target_biotypes:
-        genes = gtf.get_regions(element_type="gene",
-                                attribute="gene_type", value=biotype)
-        genes.write(filename="hg38_genes_"+biotype+".bed")
+========================
+Starting with a BED file
+========================
 
 Get the sequences in FASTA format from a BED file
 -------------------------------------------------

diff --git a/docs/source/examples_fastq.rst b/docs/source/examples_fastq.rst
@@ -1,6 +1,6 @@
-=========================
-Examples with FASTQ files
-=========================
+=====================
+Starting a FASTQ file
+=====================
 
 Trimming a FASTQ file for both sequences and quality
 ----------------------------------------------------

diff --git a/docs/source/examples_gtf.rst b/docs/source/examples_gtf.rst
@@ -0,0 +1,70 @@
+========================
+Starting with a GTF file
+========================
+
+Because ``GAnnotation`` is able to handle both ``GTF`` and ``GFF``, you can replace the GTF file in the tutorials below with the GFF file. Here we show only GTFs as examples.
+
+Extract exon, intron, and intergenetic regions in BED format from a GTF file
+----------------------------------------------------------------------------
+
+``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:
+
+- hg38_exons.bed
+- hg38_introns.bed
+- hg38_intergenic_regions.bed
+
+.. code-block:: python
+
+    from genomkit import GRegions
+    from genomkit import GAnnotation
+
+    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
+    genes = gtf.get_regions(element_type="gene")
+    exons = gtf.get_regions(element_type="exon")
+    introns = genes.subtract(exons, inplace=False)
+
+    chromosomes = GRegions(name="chromosomes")
+    chromosomes.get_chromosomes(organism="hg38")
+    intergenic_regions = chromosomes.subtract(genes, inplace=False)
+    exons.write(filename="hg38_exons.bed")
+    introns.write(filename="hg38_introns.bed")
+    intergenic_regions.write(filename="hg38_intergenic_regions.bed")
+
+
+Get all promoter regions in BED format from a GTF file
+------------------------------------------------------
+
+``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate 3 BED files as below:
+
+.. code-block:: python
+
+    from genomkit import GAnnotation
+
+    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
+    genes = gtf.get_regions(element_type="gene")
+    promoters = genes.resize(extend_upstream=2000,
+                            extend_downstream=0,
+                            center="5prime", inplace=False)
+    promoters.write(filename="hg38_promoters.bed")
+
+Extract the genes by their biotypes from a GTF file
+---------------------------------------------------
+
+``GTF_hg38`` is the path to the hg38 GTF file for annotation. Now you want to generate BED files for the biotypes below:
+
+- protein_coding
+- lncRNA
+- snRNA
+- miRNA
+
+.. code-block:: python
+
+    from genomkit import GAnnotation
+
+    gtf = GAnnotation(file_path=GTF_hg38, file_format="gtf")
+    target_biotypes = ["protein_coding", "lncRNA", "snRNA", "miRNA"]
+    for biotype in target_biotypes:
+        genes = gtf.get_regions(element_type="gene",
+                                attribute="gene_type", value=biotype)
+        genes.write(filename="hg38_genes_"+biotype+".bed")
+