Skip to content

Commit

Permalink
Update documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
GregorySchwartz committed Mar 29, 2021
1 parent 66d1ea7 commit 358391c
Show file tree
Hide file tree
Showing 8 changed files with 204,748 additions and 244 deletions.
12 changes: 10 additions & 2 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@ different perspective of single cells, using our [[http://github.com/GregorySchw
and tree measures to describe simultaneously large and small populations,
without additional parameters or runs. See below for a full list of features.

* New features for v2.2.0.0

- =--no-edger= replaced with =--edger= as the default is now Kruskal-Wallis.
- Can now use backgrounds for motifs.
- Can specify motif for genome analysis (i.e. =findMotifsGenome.pl= from HOMER).
- Temporary directories are now variables to correctly specify location.
- Added q-values for differential.
- Updated documentation for =too-many-peaks=.

* New features for v2.0.0.0

Expand Down Expand Up @@ -1004,10 +1012,10 @@ too-many-cells motifs \

In this example, we use the output from a differential expression analysis using
=too-many-cells differential= from our merged peaks. Using a complete genome
file used by our motif program of choice (here homer, but defaults to MEME) with
file used by our motif program of choice (here HOMER, but defaults to MEME) with
=--motif-genome=, we want to provide the motif program with the top 1000 most
differential peaks using =--top-n=. Lastly, while the default uses MEME, we find
homer to be much faster. The prior command shows the use of another program to
HOMER to be much faster. The prior command shows the use of another program to
find the motifs, making sure the =%s= for input and output are in the right
locations (check =too-many-cells motifs -h=).

Expand Down
356 changes: 185 additions & 171 deletions index.html

Large diffs are not rendered by default.

3,258 changes: 3,258 additions & 0 deletions too-many-peaks_doc/out/NKG7.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,099 changes: 3,099 additions & 0 deletions too-many-peaks_doc/out/NKG7_raw.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
197,773 changes: 197,773 additions & 0 deletions too-many-peaks_doc/out/NK_vs_other.csv

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions too-many-peaks_doc/out/NK_vs_other_NKG7.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
feature,log2FC,pVal,qVal
chr19:51874860-51875969,3.7957024090666374,0.0,0.0

309 changes: 248 additions & 61 deletions too-many-peaks_doc/too-many-peaks.html

Large diffs are not rendered by default.

182 changes: 172 additions & 10 deletions too-many-peaks_doc/too-many-peaks.org
Original file line number Diff line number Diff line change
Expand Up @@ -263,27 +263,189 @@ too-many-cells matrix-output \
For more information about the capabilities of visualization and differential
expression, check out [[https://gregoryschwartz.github.io/too-many-cells/]]!

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells make-tree \
--prior out \
-m ./out_min_200_peaks/cluster_peaks/union_fragments.tsv.gz \
--draw-leaf "DrawItem (DrawContinuous [\"chr19:51874860-51875969\"])" \
--custom-region "chr19:51874860-51875969" \
--draw-mark "MarkModularity" \
--dendrogram-output "NKG7_sat_10_union.svg" \
--draw-scale-saturation 10 \
--output out_test \
> test
#+end_src

#+RESULTS:
: f9b4dec5336a11a5109a73b70636c10a

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells make-tree \
--matrix-path ./data/pbmc/atac_v1_pbmc_5k_fragments.tsv.gz \
--filter-thresholds "(1000, 1)" \
--binwidth 5000 \
--output out_no_lsa \
--matrix-output mat \
> clusters_no_lsa.csv
#+end_src

#+RESULTS:
: 5ea178cae47c2c3f67d6a77bfa6222a8

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells make-tree \
--prior out_no_lsa \
-m ./out_no_lsa/mat \
--draw-leaf "DrawItem (DrawContinuous [\"chr19:51874860-51875969\"])" \
--custom-region "chr19:51874860-51875969" \
--draw-mark "MarkModularity" \
--dendrogram-output "NKG7.svg" \
--draw-scale-saturation 15 \
--output out_no_lsa \
> clusters_no_lsa.csv
#+end_src

#+RESULTS:
[[file:]]

* Identify NK cells

Now that we have a base tree with higher resolution peaks, we can now try
searching for known cell populations such as NK cells. While we can use the
=classify= entry point of =too-many-cells= to link bulk reference data with
single-cell data, we will use basic known markers to exemplify the visualization
features of the tree. Here, we will only focus on the =NKG7= region for NK
cells. So, let's look at what that accessibility looks like on the tree at that
region, making sure to overlay node numbers for easy reference! For maximum
resolution, we'll use the full tree rather than the pruned tree.

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells make-tree \
--prior out \
-m ./out_min_200_peaks/cluster_peaks/union_fragments.tsv.gz \
--draw-leaf "DrawItem (DrawContinuous [\"chr19:51874860-51875969\"])" \
--custom-region "chr19:51874860-51875969" \
--draw-mark "MarkModularity" \
--dendrogram-output "NKG7.svg" \
--draw-node-number \
--draw-scale-saturation 10 \
--output out \
> clusters.csv

printf "./out/NKG7.svg"
#+end_src

#+RESULTS:
[[file:./out/NKG7.svg]]

Here, =--custom-region= tells =too-many-peaks= to create a new feature within
that specific region. We can also use the original fragments to see the
accessibility on the tree before peak finding and filtering.

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells make-tree \
--prior out \
-m ./data/pbmc/atac_v1_pbmc_5k_fragments.tsv.gz \
--draw-leaf "DrawItem (DrawContinuous [\"chr19:51874860-51875969\"])" \
--custom-region "chr19:51874860-51875969" \
--draw-mark "MarkModularity" \
--dendrogram-output "NKG7_raw.svg" \
--draw-node-number \
--draw-scale-saturation 10 \
--output out \
> clusters.csv

printf "./out/NKG7_raw.svg"
#+end_src

#+RESULTS:
[[file:./out/NKG7_raw.svg]]

Based on the coloring and the node number overlay,
there seems to be a high level of accessibility within node 85.
To further investigate, let's see what the differential accessibility is between
node 85 and the rest of the tree (seeing more than the top 100 features and without
using =edgeR= for scATAC-seq):

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells differential \
-m ./out_min_200_peaks/cluster_peaks/union_fragments.tsv.gz \
--prior out \
--nodes "([118,1], [85])" \
--normalization "TotalNorm" \
--top-n 1000000000 \
> ./out/NK_vs_other.csv

printf "./out/NK_vs_other.csv"
#+end_src

#+RESULTS:
[[file:./out/NK_vs_other.csv]]

These results are liable to change with the inclusion of
=--blacklist-regions-file=, which should filter out unwanted regions (as noted
above). We can also see just our specific region:

#+header: :exports both
#+header: :results file
#+begin_src shell :async
too-many-cells differential \
-m ./out_min_200_peaks/cluster_peaks/union_fragments.tsv.gz \
--prior out \
--nodes "([118,1], [85])" \
--normalization "TotalNorm" \
--custom-region "chr19:51874860-51875969" \
> ./out/NK_vs_other_NKG7.csv

printf "./out/NK_vs_other_NKG7.csv"
#+end_src

#+RESULTS:
[[file:./out/NK_vs_other_NKG7.csv]]

As expected, there's some difference at the =NKG7= locus. Now we can see what
motifs may be enriched in this differential.

* Motifs

=too-many-peaks= can also identify motifs from differential expression analyses
using tools such as MEME and homer. For instance, with homer's
using tools such as MEME and HOMER. For instance, with HOMER's
=findMotifsGenome.pl= in your path, you can use the input from
=too-many-peaks='s differential accessibility output from
=too-many-cells differential= to find enriched motifs:
=too-many-cells differential= that we just calculated to find enriched motifs
(getting rid of infinity fold changes, or "divide by zero"):

# too-many-cells motifs --diff-file tmp.csv --motif-genome ~/research/genomes/hg19.fa --top-n 10000000000 -o motifs_homer_1_sustained_vs_28_untreated --motif-command "/mnt/data1/apps/homer/homer-4.9/bin/findMotifs.pl %s fasta %s"
#+header: :exports both
#+header: :results file
#+begin_src shell :async
cat ./out/NK_vs_other.csv | csvsql --query "SELECT * FROM stdin WHERE qVal < 0.05 AND log2FC > 0" | grep -v inf | grep -v Infinity > tmp.csv

too-many-cells motifs \
--diff-file diff.csv \
--motif-genome /path/to/hg19.fa \
--top-n 1000 \
--diff-file tmp.csv \
--motif-genome hg19 \
--top-n 100000000 \
-o homer_out \
--motif-genome-command "findMotifsGenome.pl %s %s %s"
#+end_src

This command would output motifs in the =homer_out= directory found using
=findMotifsGenome.pl= on the top 1000 differentially accessible sites from
=diff.csv= which was output from a =too-many-cells differential= run. Usually,
this file would be filtered from significant peaks in a certain direction.
This command outputs motifs in the =homer_out= directory found using
=findMotifsGenome.pl= on the significant and positive differentially accessible
sites from =./out/BLANK_vs_BLANK.csv= which was output from our =too-many-cells
differential= run, (so we set =--top-n= to a high number to include all sites
instead of just the top sites).

These kinds of analyses and more are all available using =too-many-peaks=, which
makes full use of the =too-many-cells= suite of tools so be sure to [[https://gregoryschwartz.github.io/too-many-cells/][check it
out!]]

0 comments on commit 358391c

Please sign in to comment.