diff --git a/content/04.methods.md b/content/04.methods.md
index f1dfee4..1203174 100644
--- a/content/04.methods.md
+++ b/content/04.methods.md
@@ -1,22 +1,44 @@
 ## Materials and Methods
 
-### Data generation
-  - how data was generated in different labs using 10X and then sent to the Data Lab
+### Data generation and processing
 
-### Data processing (do we need this section?)
-  - Mention that all data was processing using `scpca-nf` either by us or external submitters
+Raw data and metadata were generated and compiled by each lab and institution contributing to the Portal.
+Single-cell or single-nuclei libraries were generated using one of the commercially available kits from 10x Genomics.
+For bulk RNA-seq, RNA was collected and sequenced using either paired-end or single-end sequencing. 
+For spatial transcriptomics, cDNA libraries were generated using the Visium kit from 10x Genomics.
+All libraries were processed using our open-source pipeline, `scpca-nf`, to produce summarized gene expression data. 
 
 ### Processing single-cell and single-nuclei RNA-seq data with alevin-fry
-  - Use of salmon alevin and alevin-fry to process all raw FASTQ files
-  - Information on index used
-  - Parameter choices for alevin-fry
+  
+To quantify RNA-seq gene expression for each cell or nucleus in a library, `scpca-nf` uses `salmon alevin` [@doi:10.1186/s13059-020-02151-8] and `alevin-fry`[@doi:10.1038/s41592-022-01408-3] to generate a gene by cell counts matrix.
+Prior to mapping, we generated an index using transcripts from both spliced cDNA and unspliced cDNA sequences, denoted as the `splici` index [@doi:10.1038/s41592-022-01408-3].
+The index was generated from the human genome, GRCh38, Ensembl version 104. 
+`salmon alevin` was run using selective alignment to the `splici` index with the `--rad` option to generate a reduced alignment data (RAD) file required for input to `alevin-fry`. 
+
+The RAD file was used as input to the recommended `alevin-fry` workflow, with the following customizations.
+At the `generate-permit-list` step, we used the `unfiltered-pl` option to provide a list of expected barcodes specific to the 10x kit used to generate each library.
+The `quant` step was run using the `cr-like-em` resolution strategy for feature quantification and UMI de-duplication. 
 
 ### Post alevin-fry processing of single-cell and single-nuclei RNA-seq data
-  - filtering of empty droplets
-  - removal of low quality cells
-  - normalization
-  - HVG selection
-  - PCA and UMAP calculation
+
+The output from running `alevin-fry` includes a gene by cell counts matrix, with reads from both spliced and unspliced reads for all potential cell barcodes.
+This output is read into R to create a `SingleCellExperiment` using the `fishpond::load_fry()` function. 
+The resulting `SingleCellExperiment` contains a `counts` assay with a gene by cell counts matrix where all spliced and unspliced reads for a given gene are totaled together. 
+We also include a `spliced` assay that contains a gene by cell counts matrix with only spliced reads. 
+These matrices include all potential cells, including empty droplets, and are provided in the "unfiltered" objects included in downloads from the Portal.
+
+Each droplet was tested for deviation from the ambient RNA profile using `DropletUtils::emptyDropsCellRanger()` and those with an FDR ≤ 0.01 were retained as likely cells.
+If a library did not have a sufficient number of droplets and `DropletUtils::emptyDropsCellRanger()` failed, cells with fewer than 100 UMIs were removed.
+Gene expression data for any cells that remain after filtering are provided in the "filtered" objects. 
+
+In addition to removing empty droplets, `scpca-nf` also removes cells from downstream analysis that are likely to be compromised by damage or low-quality sequencing. 
+`miQC` was used to calculate the probability of each cell being compromised [@doi:10.1371/journal.pcbi.1009290]. 
+Any cells with a likelihood of being compromised greater than 0.75 and fewer than 200 genes detected were removed before further processing. 
+The gene expression counts from the remaining cells were log-normalized using the deconvolution method from Lun, Bach, and Marioni [@doi:10.1186/s13059-016-0947-7]. 
+`scran::modelGeneVar()` was used to model gene variance from the log-normalized counts and `scran::getTopHVGs` was used to select the top 2000 high-variance genes. 
+These were used as input to calculate the top 50 principal components using `scater::runPCA()`. 
+Finally, UMAP embeddings were calculated from the principal components with `scater::runUMAP()`. 
+The raw and log-normalized counts, list of 2000 high-variance genes, principal components, and UMAP embeddings are all stored in the "processed" object. 
 
 ### Quantifying gene expression for libraries with CITE-seq or cell hashing
   - How we used alevin-fry to quantify ADT and HTO libraries