Skip to content

Uploading Data For DMS

samhimes92 edited this page Mar 21, 2024 · 13 revisions

Overview

For those who want to maintain code and push new data to the you will need to be a member of the JGEnglishLab github group. To push new data to the visualizer you will need to

  1. Clone the repository locally
  2. Add new data as it is generated
  3. Commit and push the new data when you are ready for it to go live

Clone the repository

Start by cloning the JGEnglishLab.github.io repository.

Do this by running git clone https://github.com/JGEnglishLab/JGEnglishLab.github.io.git in the directory where you want to keep all the data.

Then delete the tre_mpra/ dirctory and the mpra_vis.html file. You will not need either of them.

You only need to clone the repository once. After you do you can come back and add new data whenever you want.

Add new data

When you have new data that you are want to go live there are a few steps that you need to take.

  1. Add the raw data
  2. Download and add the gnomad file
  3. Modify the metaData.tsv file
  4. Modify the descriptions.json file
  5. If you protein is a GPCR download the snakeplot svg

Add raw data

All of the raw data should be kept in /dms/data/raw_data.

The raw data is assumed to look like this. wrangle.R takes this input format and converts it to data used by the visualizer.

Screen Shot 2024-03-20 at 3 12 32 PM

The conditions are listed along the top row. The mutation names are listed along the first column. The key word "score" indicates the value for each mutation and each condition (columns with "SE" or "epsilon" are ignored). If the input files don't look like this, you will need to either, modify the input files to match this format, or change wrangle.R to accept your new input file format.

Another important not is that wrangle.R assumes all insertions to G, GS, or GSG. They are labeled as i1, i2, and i3 respectively. If you start using other insertions you will need to modify wrangle.R

Download gnomad file

Start by going to the gnomad website.

Search for your protein of interest.

Make sure that the variant that you pulled up is the exact same as the one that you are used in your DMS.

Once you are sure that you have the correct variant pulled up scroll down to the "gnomAD variants" section. Uncheck pLoF, Synonymous and Other. Make sure Missense / Inframe indel is the only box checked. Then click Export variants to CSV. Put the downloaded file in the gnomad/ dirctory.

Screen Shot 2024-03-20 at 3 50 06 PM

Modify metaData.tsv

The metaData.tsv file helps to map input files to your protein. There are 4 columns in metaData.tsv.

  1. protein This is the name of the protein. This is how the name will show up in the drop down of the visualization.

Screen Shot 2024-03-20 at 4 02 30 PM

  1. raw_file Put the exact name of the raw file that you put in the raw_file directory.

  2. gpcrdb_id You only need to add this if the protein that you are adding is a gpcr. To find the gpcrdb go to the gpcr_db website. Search the your gpcr. The name in the parenthesis is the gpcrdb_id. For example, when I search GPR68, I can see that "ogr1_human" is the gpcrdb_id.

Screen Shot 2024-03-20 at 4 13 49 PM
  1. gnomad_file Put the exact name of the gnomad file that you put in the gnomad/ dirctory.

Modify description.json

Download snakeplot from gpcrdb

If the protein that you are adding is a GPCR. Go to gpcr_db website and download an SVG of the snakeplot. It's important to note, that the N and C terminal domains, as well as the intra and extra cellular domains are hidden by default. Before downloading the SVG you must click on the squares to expand them.

Screen Shot 2024-03-21 at 11 11 41 AM

After expanding all the squares click the download button and select the "SVG" option. Save the SVG in the snake folder. Change the name of the file to be the name of your protein that you put in metaData.tsv. For example, GPR68.svg.

Commit and push the new data

Checking it locally

Before you actually push anything to github, I recommend that you the the visualizer locally to make sure everything looks they way it should. To do this, install Visual Studio Code. Clicker on the extension window Screen Shot 2024-03-21 at 11 25 38 AM and install "Liver Server" Screen Shot 2024-03-21 at 11 26 37 AM

After it is installed. Open the directory where you cloned the repository. Then click "Go Live" button. (Should be on the bottom right hand side of the screen.)Screen Shot 2024-03-21 at 11 29 56 AM

That should open a tab in your internet browser. Click on dms_vis.html to open the visualization. This will pull up a local version of the visualization. Verify that everything looks right. If it does then you are ready to push your data.

Going live

Clone this wiki locally