Skip to content

Uploading Data For DMS

samhimes92 edited this page Mar 20, 2024 · 13 revisions

Overview

For those who want to maintain and push data to the you will need to be a member of the JGEnglishLab github group. To push new data to the visualizer you will need to

  1. Clone the repository locally
  2. Add new data as it is generated
  3. Commit and push the new data when you are ready for it to go live

Clone the repository

Start by cloning the JGEnglishLab.github.io repository.

Do this by running git clone https://github.com/JGEnglishLab/JGEnglishLab.github.io.git in the directory where you want to keep all the data.

Then delete the tre_mpra/ dirctory and the mpra_vis.html file. You will not need either of them.

You only need to clone the repository once. After you do you can come back and add new data whenever you want.

Add new data

When you have new data that you are want to go live there are a few steps that you need to take.

  1. Add the raw data
  2. Download and add the gnomad file
  3. Modify the metaData.tsv file
  4. Modify the descriptions.json file
  5. If you protein is a GPCR download the snakeplot svg

Add raw data

All of the raw data should be kept in /dms/data/raw_data.

The raw data is assumed to look like this. wrangle.R takes this input format and converts it to data used by the visualizer.

Screen Shot 2024-03-20 at 3 12 32 PM

The conditions are listed along the top row. The mutation names are listed along the first column. The key word "score" indicates the value for each mutation and each condition (columns with "SE" or "epsilon" are ignored). If the input files don't look like this, you will need to either, modify the input files to match this format, or change wrangle.R to accept your new input file format.

Another important not is that wrangle.R assumes all insertions to G, GS, or GSG. They are labeled as i1, i2, and i3 respectively. If you start using other insertions you will need to modify wrangle.R

Download gnomad file

Start by going to the gnomad website.

Search for your protein of interest.

Make sure that the variant that you pulled up is the exact same as the one that you are used in your DMS.

Once you are sure that you have the correct variant pulled up scroll down to the "gnomAD variants" section. Uncheck pLoF, Synonymous and Other. Make sure Missense / Inframe indel is the only box checked. Then click Export variants to CSV. Put the downloaded file in the gnomad/ dirctory.

Screen Shot 2024-03-20 at 3 50 06 PM

Modify metaData.tsv

The metaData.tsv file helps to map input files to your protein. There are 4 columns in metaData.tsv.

  1. protein This is the name of the protein. This is how the name will show up in the drop down of the visualization.

Screen Shot 2024-03-20 at 4 02 30 PM

  1. raw_file Put the exact name of the raw file that you put in the raw_file directory.

  2. gpcrdb_id You only need to add this if the protein that you are adding is a gpcr. To find the gpcrdb go to the gpcr_db website. Search the your gpcr. The name in the parenthesis is the gpcrdb_id. For example, when I search GPR68, I can see that "ogr1_human" is the gpcrdb_id.

Screen Shot 2024-03-20 at 4 13 49 PM
  1. gnomad_file Put the exact name of the gnomad file that you put in the gnomad/ dirctory.

Modify description.json

Download snakeplot from gpcrdb

Clone this wiki locally