-
Notifications
You must be signed in to change notification settings - Fork 0
Uploading Data For DMS
For those who want to maintain code and push new data to the visualization, you will need to be a member of the JGEnglishLab GitHub group. To push new data to the visualizer, you will need to:
- Clone the repository locally
- Add new data as it is generated
- Commit and push the new data when you are ready for it to go live
Start by cloning the JGEnglishLab.github.io repository.
Do this by running git clone https://github.com/JGEnglishLab/JGEnglishLab.github.io.git
in the directory where you want to keep all the data and code.
Then delete the tre_mpra/
directory and the mpra_vis.html
file. You will not need either of them.
You only need to clone the repository once. After you do, you can come back and add new data whenever you want.
When you have new data that you are want to go live, there are a few steps that you need to take.
- Add the raw data
- Download and add the gnomad file
- Modify the
metaData.tsv
file - Modify the
descriptions.json
file - If you protein is a GPCR download the snakeplot svg
- Run wrangle.R
All of the raw data should be kept in the /dms/data/raw_data
directory.
The raw data is assumed to look like this. wrangle.R
takes this input format and converts it to data used by the visualizer.
The conditions are listed along the top row. The mutation names are listed along the first column. The key word "score" indicates the value for each mutation and each condition (columns with "SE" or "epsilon" are ignored). If the input files don't look like this, you will need to either, modify the input files to match this format, or change wrangle.R
to accept your new input file format.
Another important not is that wrangle.R
assumes all insertions to G
, GS
, or GSG
. They are labeled as i1, i2, and i3 respectively. If you start using other insertions you will need to modify wrangle.R
Start by going to the gnomad website.
Search for your protein of interest.
Make sure that the variant that you pulled up is the exact same as the one that you are used in your DMS.
Once you are sure that you have the correct variant pulled up scroll down to the "gnomAD variants" section. Uncheck pLoF
, Synonymous
and Other
. Make sure Missense / Inframe indel
is the only box checked. Then click Export variants to CSV
. Put the downloaded file in the /dms/data/gnomad/
dirctory.
The metaData.tsv
file helps to map input files to your protein. There are 4 columns in metaData.tsv
.
-
protein
This is the name of the protein. This is how the name will show up in the drop down of the visualization.
-
raw_file
Put the exact name of the raw file that you put in thedms/data/raw_file/
directory. -
gpcrdb_id
You only need to add this if the protein that you are adding is a gpcr. To find thegpcrdb_id
go to the gpcr_db website. Search the name of your gpcr. The name in the parenthesis is thegpcrdb_id
. For example, when I search GPR68, I can see that "ogr1_human" is thegpcrdb_id
.
-
gnomad_file
Put the exact name of the gnomad file that you put in thedms/data/gnomad/
dirctory.
The description.json
file contains information about the protein and your conditions. Create a new entry with the information about your protein/conditions. Replace all the text that starts with "INSERT_". If a entry isn't applicable you can delete it.
Here are a few important notes.
- Replace "INSERT_NAME_OF_YOUR_PROTEIN" with the name of your protein. This must match the name that you put in the
protein
column inmetaData.tsv
. - "INSERT_INSERTION_INFO" should be replaced with something like this (If you have insertions) "i1 = insert G, i2 = insert GS, i3 = insert GSG".
- Add as many conditions as you have. The "INSERT_CONDITION_1" line should be replaced with the condition name in the raw data file. (The names across the top row)
- Try to keep all the descriptions brief.
"INSERT_NAME_OF_YOUR_PROTEIN": {
"experiment_description": "INSERT_EXPERIMENT_DESCRIPTION",
"protein_type": "INSERT_PROTEIN_TYPE",
"publication_name": "INSERT_NAME_OF_PUBLICATION",
"publication_link": "INSERT_URL",
"insertion_info": "INSERT_INSERTION_INFO",
"conditions": {
"INSERT_CONDITION_1": {
"low": "INSERT_CONDITION_1_LOW_EFFECT_SIZE_DESCRIPTION",
"high": "INSERT_CONDITION_1_HIGH_EFFECT_SIZE_DESCRIPTION",
"assay_description": "INSERT_CONDITION_1_ASSAY_DESCRIPTION"
},
"INSERT_CONDITION_2": {
"low": "INSERT_CONDITION_2_LOW_EFFECT_SIZE_DESCRIPTION",
"high": "INSERT_CONDITION_2_LOW_EFFECT_SIZE_DESCRIPTION",
"assay_description": "INSERT_CONDITION_2_LOW_EFFECT_SIZE_DESCRIPTION"
}
}
}
If the protein that you are adding is a gpcr. Go to gpcr_db website, search for your gpcr, and download an SVG of the snakeplot. It's important to note, that the N and C terminal domains, as well as the intra and extra cellular domains are hidden by default. Before downloading the SVG you must click on the squares to expand them.
After expanding all the squares click the download button and select the "SVG" option. Save the SVG in the /dms/data/snake/
directory. Change the name of the file to be the name of your protein that you put in metaData.tsv
. For example, GPR68.svg
.
wrangle.R will read all of the raw data, and format it to be compatible with the visualizer. The output of the script is dms_data_wrangled.csv
. The script will overwrite the dms_data_wrangled.csv
file that is currently in the /dms/data/directory
. So if you do not want lose the original, give it a new name or save it somewhere else.
To run wrangle.R simply open a terminal in the /dms/data
and run Rscript wrangle.R
.
Before you actually push anything to GitHub, I recommend that you test the visualizer locally to make sure everything looks they way it should. To do this, install Visual Studio Code. Click on the extension window and install the "Live Server" extension.
After it is installed. Open the directory in Visual Studio Code where you cloned the repository. Then click "Go Live" button. (Should be on the bottom right hand side of the screen.)
That should open a tab in your internet browser. Click on dms_vis.html to open the visualization. (It may be in the JGEnglishLab.gihub.io/
folder.) This will pull up a local version of the visualization. Verify that everything looks right. If it does then you are ready to push your data.
To go live, open a terminal where you cloned the repository. If you type git status
you should see a list of files that have been changed.
You will need to git add
the files that you want to commit. Specifically you should git add
the following files.
metaData.tsv
dms_data_wrangled.csv
- The file you added to the
dms/data/gnomad
directory - The file you added to the
dms/data/raw_data
dirctory - The file you added to the
dms/data/snake
directory (If a gpcr) description.json
After you've used git add
to add those files. Do git status
one more time to make sure that you added everything that you want.
Then commit the changes by typing git commit -m "..."
(Replace the ... with a message describing the changes you are about to make.) Then type git push
. Once the push has finished your new data should be live!