-
Notifications
You must be signed in to change notification settings - Fork 5
Preparing Input Data and config.json
NetProphet requires a collection of input files. Please process your input data according the following descriptions and properly format them.
FILE/DIRECTORY | DESCRITPION |
---|---|
FILENAME_EXPRESSION_DATA | A matrix of the log2 fold-change expression values in the samples with respect to those in wild type or control. Rows represent genes, columns represent samples/conditions, i.e. the matrix dimension is number of genes x number of samples. |
FILENAME_DE_ADJMTR | A adjacency matrix of the interactions between regulators and target genes, which are calculated via differential expression analysis. The rows represent regulators/TFs and the columns represent genes, i.e. the matrix dimension is number of regulators x number of target genes. For each possible interaction between regulator i (Ri) and target gene j (Tj), set entry Mij to the signed logged differential expression significance of Tj when Ri is perturbed. If Ri has not been perturbed, then set Mij = 0 for all j. See the following Calculate differential expression matrix for details. |
FILENAME_GENES | A list of gene names. Capitalized systematic names are recommended. |
FILENAME_REGULATORS | A list of gene names that encode transcription factors (TFs). These regulators must be included in the list of gene names. The regulator names should have the same naming scheme as the gene names. |
FILENAME_SAMPLE_CONDITIONS | A list of samples/conditions. If a gene was perturbed in a condition, set the condition name as the gene name; otherwise, set as any identifier without space delimiter. |
FILENAME_PROMOTERS | The promoter sequences of the target genes in fasta format. The header of each promoter is the gene name only. |
DIR_DBD_PID | A directory of the percent identities (PIDs) between the DNA binding domains (DBDs). Each file is titled as the name of the regulator associated with a DBD. There are two columns in the file: each entry of the first column is the regulator name associated with other DBDs, and the entry of the second column is the corresponding PID calculated beforehand. See the following Calculate percent identities of TFs' DNA-binding domains for details. |
NOTE: The example input data provided in this repo is used for mapping a small Yeast subnetwork. Visit http://mblab.wustl.edu/software.html for the resources for mapping whole TF network in yeast and fruit fly.
For each TF perturbation, for each gene in the perturbation condition, we recommend that you use LIMMA to calculate the log odds that the gene is differentially expressed in the perturbation condition compared to the wild type (WT) condition. The differential expression component is a signed confidence score Dij, which is calculated using the log odds score Li(j) and the log2-fold change Yi(j) of gene j and TF i as follows.
Dij = Li(j)*sgn(Yi(j) when Li(j) > 0 and Dij = 0 when Li(j) <= 0
For each TF perturbation, we recommend that you use Cuffdiff to calculate the significance of differential expression (i.e. the uncorrected p-value and the FDR-adjusted p-value) of each gene in the perturbation condition compared to the WT condition. The differential expression component is a signed confidence score Dij, which is calculated using the uncorrected p-value Pi(j), the FDR-adjusted p-value Fi(j), and the log2-fold change Yi(j) of gene j and TF i as follows.
Dij = -ln(Pi(j))*sgn(Yi(j) when Fi(j) <= 0.05 and Dij = 0 when Fi(j) >= 0.05
See supplemental package at https://github.com/yiming-kang/DBD_PercentIdentity_Calculation for details.
PARAMETER | DESCRITPION |
---|---|
PROMOTER_LENGTH | The length of your promoter definition in bp. |
MOTIF_THRESHOLD | The threshold on the robustness score calculated in FIRE's jack-knife validation. Choose a value between 16 (default) and 20. NetProphet 2.0 paper used the threshold of 20. |
Here, let's configure your input data, parameters, and output path in config.json
. All required input files are in directory RESOURCES/
, and the output network file is in directory OUTPUT/
.
{
"NETPROPHET2_DIR": "/path/to/NetProphet_2.0",
"RESOURCES_DIR": "RESOURCES",
"OUTPUT_DIR": "OUTPUT",
"FILENAME_EXPRESSION_DATA": "data.expr",
"FILENAME_DE_ADJMTR": "signed.de.adj",
"FILENAME_GENES": "genes",
"FILENAME_REGULATORS": "regulators",
"FILENAME_SAMPLE_CONDITIONS": "conditions",
"DBD_PID_DIR": "DBD_PIDS",
"FILENAME_PROMOTERS": "promoter.fasta",
"MOTIF_THRESHOLD": 16,
"FILENAME_NETPROPHET2_NETWORK": "netprophet2_network.adjmtr"
}