Skip to content

Commit

Permalink
updates to all notebooks and addition of response figure notebook, as…
Browse files Browse the repository at this point in the history
… well as updates to all DNM data
  • Loading branch information
Tom Sasani committed Jul 7, 2019
1 parent e99ca07 commit f922ffb
Show file tree
Hide file tree
Showing 16 changed files with 31,597 additions and 32,094 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,36 +8,37 @@ Below is an example figure from the MS, which summarizes our finding that the pa

![alt text](img/fig3.png)

Two notebooks, `ms_figs.R.ipynb` and `ms_figs.python.ipynb`, can be used to reproduce figures from the manuscript. A notebook called `inter-family-variability.ipynb` can be used to reproduce the statistical analyses associated with inter-family variability in parental age effects. Figures in the manuscript were generated with the versions of each library listed below, though more recent versions (if applicable) will likely work, as well.
Two notebooks, `ms_figs.R.ipynb` and `ms_figs.python.ipynb`, can be used to reproduce figures from the manuscript. A notebook called `inter-family-variability.ipynb` can be used to reproduce the statistical analyses associated with inter-family variability in parental age effects. Finally, figures presented in the main response to reviewers can be reproduced using `response_figures.ipynb`. Figures in the manuscript were generated with the versions of each library listed below, though more recent versions (if applicable) will likely work, as well.

To mitigate compatability/version issues, we have **packaged all notebooks into a [binder](mybinder.org) environment**; to access the environment, **simply click on the badge below**. It might take a minute to load everything.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/quinlan-lab/ceph-dnm-manuscript/master)

The files included in the `data` directory are organized as follows:

`f1.dnms.txt` and `f2.dnms.txt` contain a row for every DNM identified in the F1 or F2 generation, respectively. `gonosomal.dnms.txt` and `post-pgcs.dnms.txt` contain a row for every DNM identified as being gonosomal or a post-PGCS mosaic mutation, respectively. Each row in these files is formatted like a heavily annotated BED entry, where the first three columns indicate the chromosome, start position, and end position of the variant, followed by additional columns with per-variant information, such as the reference and alternate alleles, depth and genotype qualities in the proband and parents, etc.
`second_gen.dnms.txt` and `third_gen.dnms.txt` contain a row for every DNM identified in the second or third generation, respectively. `gonosomal.dnms.txt` and `post-pgcs.dnms.txt` contain a row for every DNM identified as being gonosomal or a post-PGCS mosaic mutation, respectively. Each row in these files is formatted like a heavily annotated BED entry, where the first three columns indicate the chromosome, start position, and end position of the variant, followed by additional columns with per-variant information, such as the reference and alternate alleles, depth and genotype qualities in the proband and parents, etc.

For example, the first line of `f1.dnms.txt` is shown below:
For example, the first few lines of `second_gen.dnms.txt` are shown below:

```
chrom start end new_sample_id new_family_id ref alt mut new_paternal_id new_maternal_id kid_ref_depth kid_alt_depth kid_total_depth kid_allele_balance mom_ref_depth mom_alt_depthmom_total_depth dad_ref_depth dad_alt_depth dad_total_depth kid_qual mom_qual dad_qual grandparental_evidence paternal_age_at_conception maternal_age_at_conception phase
1 1088581 1088585 308 19 CTCT C indel 293 294 7 11 18 0.611111111111 45 0 45 36 0 36 99.0 99.0 90.0 0 37.5 33.3 46084 paternal
1 1142254 1142255 538 29 G A CpG>TpG 544 543 17 15 32 0.46875 24 0 24 35 0 35 99.0 63.0 99.0 0 32.7 27.0 paternal
chrom start end new_sample_id new_family_id ref alt mut new_paternal_id new_maternal_id kid_ref_depth kid_alt_depth kid_total_depth kid_allele_balance mom_ref_depth mom_alt_depth mom_total_depth dad_ref_depth dad_alt_depth dad_total_depth kid_qual mom_qual dad_qual paternal_age_at_birth maternal_age_at_birth phase
1 1142254 1142255 538 29 G A CpG>TpG 544 543 17 15 32 0.46875 24 0 24 35 0 35 99.0 63.0 99.0 32.69999999999999 27.0 paternal
1 1461136 1461137 257 16 G C C>G 261 263 12 17 29 0.5862068965517241 27 0 27 45 0 45 99.0 81.0 81.0 30.800000000000008 22.0 paternal
```

`f1.dnms.summary.csv`, `f2.dnms.summary.csv`, `gonosomal.dnms.summary.csv`, and `post-pgcs.dnms.summary.csv` are summary CSV files, and contain a row for every sample in the F1, F2, F1 (again, but for the gonosomal DNMs), and F2 (again, but for the post-PGCS DNMs) generations, respectively. Each row contains summary information about the sample, including the total number of DNMs identified, the numbers that were phased to either parental allele, the callable autosomal fraction in the sample, etc.
`second_gen.dnms.summary.csv`, `third_gen.dnms.summary.csv`, `gonosomal.dnms.summary.csv`, and `post-pgcs.dnms.summary.csv` are summary CSV files, and contain a row for every sample in the second, third, second (again, but for the gonosomal DNMs), and third (again, but for the post-PGCS DNMs) generations, respectively. Each row contains summary information about the sample, including the total number of DNMs identified, the numbers that were phased to either parental allele, the callable autosomal fraction in the sample, etc.

For example, the first line of `f1.dnms.summary.csv` is shown below:
For example, the first few lines of `second_gen.dnms.summary.csv` is shown below:

```
all_dnms,alpha,autosomal_callable_fraction,autosomal_dnms,dad_age,dad_dnms,dad_dnms_auto,dad_dnms_auto_snv,dad_dnms_snv,family_id,maternal_id,mom_age,mom_dnms,mom_dnms_auto,mom_dnms_auto_snv,mom_dnms_snv,n_children,n_sibs,paternal_id,phased_frac,sample_id,snv_autosomal_dnms,snv_dnms
60.0,0.6440677966101694,2587148570.0,60.0,22.8,38.0,38.0,36.0,18.0,20,329,21.6,21.0,21.0,18.0,18.0,9.0,1.0,330,0.9833333333333333,328,55.0,55.0
all_dnms,alpha,autosomal_callable_fraction,autosomal_dnms,dad_age,dad_dnms,dad_dnms_auto,dad_dnms_auto_snv,dad_dnms_snv,family_id,maternal_id,mom_age,mom_dnms,mom_dnms_auto,mom_dnms_auto_snv,mom_dnms_snv,n_children,n_sibs,paternal_id,phased_frac,sample_id,snv_autosomal_dnms,snv_dnms,mean_depth
57.0,0.6545454545454545,2587148570.0,57.0,22.8,36.0,36.0,34.0,17.0,20,329,21.599999999999994,19.0,19.0,17.0,17.0,9.0,1.0,330,0.9649122807017544,328,52.0,52.0,36.822442140553065
86.0,0.8604651162790697,2560784557.0,86.0,30.800000000000008,74.0,74.0,69.0,11.0,16,263,22.0,12.0,12.0,11.0,11.0,9.0,1.0,261,1.0,257,80.0,80.0,33.838202333693275
```

#### Dependencies (if **not** using the Binder environment provided above)

#### For `python 3.6`:
#### For `python 3.6.8`:

`scipy v1.1.0`

Expand Down
71 changes: 0 additions & 71 deletions data/f1.dnms.summary.csv

This file was deleted.

Loading

0 comments on commit f922ffb

Please sign in to comment.