Skip to content

Uploading Data For DMS

samhimes92 edited this page Mar 21, 2024 · 13 revisions

Overview

For those who want to maintain code and push new data to the visualization, you will need to be a member of the JGEnglishLab GitHub group. To push new data to the visualizer, you will need to:

  1. Clone the repository locally
  2. Add new data as it is generated
  3. Commit and push the new data when you are ready for it to go live

Clone the repository

Start by cloning the JGEnglishLab.github.io repository.

Do this by running git clone https://github.com/JGEnglishLab/JGEnglishLab.github.io.git in the directory where you want to keep all the data and code.

Then delete the tre_mpra/ directory and the mpra_vis.html file. You will not need either of them.

You only need to clone the repository once. After you do, you can come back and add new data whenever you want.

Add new data

When you have new data that you are want to go live, there are a few steps that you need to take.

  1. Add the raw data
  2. Download and add the gnomad file
  3. Modify the metaData.tsv file
  4. Modify the descriptions.json file
  5. If you protein is a GPCR download the snakeplot svg
  6. Run wrangle.R

Add raw data

All of the raw data should be kept in the /dms/data/raw_data directory.

The raw data is assumed to look like this. wrangle.R takes this input format and converts it to data used by the visualizer.

Screen Shot 2024-03-20 at 3 12 32 PM

The conditions are listed along the top row. The mutation names are listed along the first column. The key word "score" indicates the value for each mutation and each condition (columns with "SE" or "epsilon" are ignored). If the input files don't look like this, you will need to either, modify the input files to match this format, or change wrangle.R to accept your new input file format.

Another important not is that wrangle.R assumes all insertions to G, GS, or GSG. They are labeled as i1, i2, and i3 respectively. If you start using other insertions you will need to modify wrangle.R

Download gnomad file

Start by going to the gnomad website.

Search for your protein of interest.

Make sure that the variant that you pulled up is the exact same as the one that you are used in your DMS.

Once you are sure that you have the correct variant pulled up scroll down to the "gnomAD variants" section. Uncheck pLoF, Synonymous and Other. Make sure Missense / Inframe indel is the only box checked. Then click Export variants to CSV. Put the downloaded file in the /dms/data/gnomad/ dirctory.

Screen Shot 2024-03-20 at 3 50 06 PM

Modify metaData.tsv

The metaData.tsv file helps to map input files to your protein. There are 4 columns in metaData.tsv.

  1. protein This is the name of the protein. This is how the name will show up in the drop down of the visualization.

Screen Shot 2024-03-20 at 4 02 30 PM

  1. raw_file Put the exact name of the raw file that you put in the dms/data/raw_file/ directory.

  2. gpcrdb_id You only need to add this if the protein that you are adding is a gpcr. To find the gpcrdb_id go to the gpcr_db website. Search the name of your gpcr. The name in the parenthesis is the gpcrdb_id. For example, when I search GPR68, I can see that "ogr1_human" is the gpcrdb_id.

Screen Shot 2024-03-20 at 4 13 49 PM
  1. gnomad_file Put the exact name of the gnomad file that you put in the dms/data/gnomad/ dirctory.

Modify description.json

Download snakeplot from gpcrdb

If the protein that you are adding is a gpcr. Go to gpcr_db website, search for your gpcr, and download an SVG of the snakeplot. It's important to note, that the N and C terminal domains, as well as the intra and extra cellular domains are hidden by default. Before downloading the SVG you must click on the squares to expand them.

Screen Shot 2024-03-21 at 11 11 41 AM

After expanding all the squares click the download button and select the "SVG" option. Save the SVG in the /dms/data/snake/ directory. Change the name of the file to be the name of your protein that you put in metaData.tsv. For example, GPR68.svg.

Run wrangle.R

wrangle.R will read all of the raw data, and format it to be compatible with the visualizer. The output of the script is dms_data_wrangled.csv. The script will overwrite the dms_data_wrangled.csv file that is currently in the directory. So if you do not want lose the original, give it a new name or save it somewhere else.

To run wrangle.R simply open a terminal in the directory that contains wrangle.R and run Rscript wrangle.R.

Commit and push the new data

Checking it locally

Before you actually push anything to github, I recommend that you test the visualizer locally to make sure everything looks they way it should. To do this, install Visual Studio Code. Click on the extension window Screen Shot 2024-03-21 at 11 25 38 AM and install "Live Server" Screen Shot 2024-03-21 at 11 26 37 AM

After it is installed. Open the directory in Visual Studio Code where you cloned the repository. Then click "Go Live" button. (Should be on the bottom right hand side of the screen.)Screen Shot 2024-03-21 at 11 29 56 AM

That should open a tab in your internet browser. Click on dms_vis.html to open the visualization. This will pull up a local version of the visualization. Verify that everything looks right. If it does then you are ready to push your data.

Going live

To go live, open the directory where you cloned the repository in the terminal. If you type git status you should see a list of things that have been changed.

You will need to git add the files that you want to commit. Specifically you should git add the following files.

  1. metaData.tsv
  2. dms_data_wrangled.csv
  3. The file you added to the gnomad directory
  4. The file you added to the raw_data dirctory
  5. The file you added to the snake directory (If a gpcr)
  6. description.json

After you've used git add to add those files. Do git status one more time to make sure that you added everything that you want. Then commit the changes by typing git commit -m "..." (Replace the ... with a message describing the changes you are about to make. Then type git push. Once the push has finished your new data should be live!

Clone this wiki locally