-
Notifications
You must be signed in to change notification settings - Fork 0
Uploading Data For DMS
For those who want to maintain code and push new data to the you will need to be a member of the JGEnglishLab github group. To push new data to the visualizer you will need to
- Clone the repository locally
- Add new data as it is generated
- Commit and push the new data when you are ready for it to go live
Start by cloning the JGEnglishLab.github.io repository.
Do this by running git clone https://github.com/JGEnglishLab/JGEnglishLab.github.io.git
in the directory where you want to keep all the data.
Then delete the tre_mpra/
dirctory and the mpra_vis.html
file. You will not need either of them.
You only need to clone the repository once. After you do you can come back and add new data whenever you want.
When you have new data that you are want to go live there are a few steps that you need to take.
- Add the raw data
- Download and add the gnomad file
- Modify the
metaData.tsv
file - Modify the
descriptions.json
file - If you protein is a GPCR download the snakeplot svg
- Run wrangle.R
All of the raw data should be kept in /dms/data/raw_data.
The raw data is assumed to look like this. wrangle.R
takes this input format and converts it to data used by the visualizer.
The conditions are listed along the top row. The mutation names are listed along the first column. The key word "score" indicates the value for each mutation and each condition (columns with "SE" or "epsilon" are ignored). If the input files don't look like this, you will need to either, modify the input files to match this format, or change wrangle.R
to accept your new input file format.
Another important not is that wrangle.R
assumes all insertions to G
, GS
, or GSG
. They are labeled as i1, i2, and i3 respectively. If you start using other insertions you will need to modify wrangle.R
Start by going to the gnomad website.
Search for your protein of interest.
Make sure that the variant that you pulled up is the exact same as the one that you are used in your DMS.
Once you are sure that you have the correct variant pulled up scroll down to the "gnomAD variants" section. Uncheck pLoF
, Synonymous
and Other
. Make sure Missense / Inframe indel
is the only box checked. Then click Export variants to CSV
. Put the downloaded file in the gnomad/
dirctory.
The metaData.tsv
file helps to map input files to your protein. There are 4 columns in metaData.tsv
.
- protein This is the name of the protein. This is how the name will show up in the drop down of the visualization.
-
raw_file Put the exact name of the raw file that you put in the
raw_file
directory. -
gpcrdb_id You only need to add this if the protein that you are adding is a gpcr. To find the gpcrdb go to the gpcr_db website. Search the your gpcr. The name in the parenthesis is the gpcrdb_id. For example, when I search GPR68, I can see that "ogr1_human" is the gpcrdb_id.
- gnomad_file
Put the exact name of the gnomad file that you put in the
gnomad/
dirctory.
If the protein that you are adding is a GPCR. Go to gpcr_db website and download an SVG of the snakeplot. It's important to note, that the N and C terminal domains, as well as the intra and extra cellular domains are hidden by default. Before downloading the SVG you must click on the squares to expand them.
After expanding all the squares click the download button and select the "SVG" option. Save the SVG in the snake folder. Change the name of the file to be the name of your protein that you put in metaData.tsv
. For example, GPR68.svg
.
wrangle.R will read all of the raw data, and format it to be compatible with the visualizer. The output of the script is dms_data_wrangled.csv
. The script will overwrite the dms_data_wrangled.csv
file that is currently in the directory. So if you do not want lose the original, give it a new name or save it somewhere else. rename the old copy.
To run wrangle.R simply open a terminal in the directory that contains wrangle.R
and run Rscript wrangle.R
.
Before you actually push anything to github, I recommend that you the the visualizer locally to make sure everything looks they way it should. To do this, install Visual Studio Code. Click on the extension window and install "Liver Server"
After it is installed. Open the directory where you cloned the repository. Then click "Go Live" button. (Should be on the bottom right hand side of the screen.)
That should open a tab in your internet browser. Click on dms_vis.html to open the visualization. This will pull up a local version of the visualization. Verify that everything looks right. If it does then you are ready to push your data.
To go live, open the directory where you cloned the repository in the terminal. If you type git status
you should see a list of things that have been changed.
You will need to git add
the files that you want to commit. Specifically you should git add
the following files.
- metaData.tsv
- dms_data_wrangled.csv
- The file you added to the gnomad directory
- The file you added to the raw_data dirctory
- The file you added to the snake directory (If a gpcr)
- description.json
After you've used git add
to add those files. Do git status
one more time to make sure that you added everything that you want.
Then commit the changes by typing git commit -m "..."
(Replace the ... with a message describing the changes you are about to make. Then type git push
. Once the push has finished your new data should be live!