This project consists of a series of laboratory exercises focused on DNA analysis using R, covering various computational techniques for analyzing genomic data. Each lab explores a different facet of data wrangling, visualization, and statistical analysis in the context of DNA sequencing and genomics.
-
Lab 1: Introduction to DNA Data Analysis
- Focus: Basic DNA sequence manipulation, extraction, and visualization techniques using R.
- Key Tools:
ggplot2
,dplyr
, and basic R functions for string manipulation and plotting.
-
Lab 2: DNA Mutation Analysis
- Focus: Identifying and analyzing mutations within DNA sequences.
- Key Tools: Mutation frequency analysis, comparative genomics, and visualization of mutations across different samples.
-
Lab 3: GC Content Analysis
- Focus: Calculating GC content in DNA sequences and its implications for genomic stability.
- Key Tools: Sliding window algorithms for GC content analysis and R plotting libraries for visualizing GC content distribution.
-
Lab 4: DNA Fragment Analysis
- Focus: Analyzing DNA fragment lengths and their distribution in genomic samples.
- Key Tools: Histograms, density plots, and statistical tests to compare fragment lengths across different conditions.
-
Lab 5: Sequence Alignment
- Focus: Aligning DNA sequences and evaluating the quality of alignments.
- Key Tools: Pairwise and multiple sequence alignment techniques, BLAST, and visualization of alignment results.
-
Lab 6: Phylogenetic Tree Construction
- Focus: Constructing phylogenetic trees based on DNA sequence similarity.
- Key Tools: Distance matrices, neighbor-joining methods, and tree visualization libraries.
-
Lab 7: RNA Sequencing Data Analysis
- Focus: Analyzing RNA sequencing data to study gene expression levels.
- Key Tools: RNA-seq data processing, differential expression analysis, and visualizing expression levels with heatmaps and volcano plots.
-
Lab 8: DNA Variant Analysis
- Focus: Identifying and analyzing single nucleotide polymorphisms (SNPs) and other variants in DNA sequences.
- Key Tools: Variant calling tools, annotation of variants, and visualization of variant distribution across populations.
-
Final Project: Comprehensive DNA Data Analysis
- Focus: A final comprehensive analysis combining all the techniques learned in previous labs to analyze a complete DNA dataset.
- Key Tools: A combination of sequence alignment, mutation analysis, GC content, fragment analysis, and variant calling to provide a holistic view of genomic data.
-
R Libraries:
ggplot2
,dplyr
,Biostrings
,phytools
,seqinr
, and more.
-
Data:
- Publicly available DNA sequencing datasets.
-
Software:
- RStudio, Jupyter Notebook with R kernel, or a similar environment for R-based DNA analysis.
- Download the repository and open each
.Rmd
file corresponding to the lab you are working on. - Ensure that all required libraries are installed before running the scripts.
- Run the cells sequentially in the RMarkdown file to perform the DNA analysis.
- Review the results and visualizations generated after each lab section to understand the genomic insights provided by the analysis.
- Lisa Mechaly Bensoussan
- Emmanuelle Fareau
- Dan levy