-
Notifications
You must be signed in to change notification settings - Fork 16
Database schema
cmonkey-python writes the results of its computation to an SQLite database. This choice was made, because SQLite is a free, open source and portable data store which is available on many systems and has programming interfaces to a large number of programming languages. Another important aspect is that the entire database is stored in a single file, which can be easily copied, archived and analyzed. In this section the database structure and its function is explained in further detail.
Note 1: The tables ending in _stats
are only used in the cluster_viewer application and are subject to change.
Note 2: SQLite is different from other RDBMS in that each table has an implicit column rowid
that acts like an auto incremented integer valued primary key. It is normally not shown in the frontend, but we will add it here for clarity
run_infos
rowid int
start_time timestamp
finish_time timestamp
num_iterations int
last_iteration int
organism text
species text
ncbi_code int
num_rows int
num_columns int
num_clusters int
git_sha text
This table represents the current information about a cmonkey run and only stores a single entry that is continuously updated until a run is finished.
row_names, column_names
rowid int
order_num int
name text
These two tables are structurally identical. They reflect the structure of the input gene expression matrix, to preserve the order of the rows and columns, their order is stored as well.
row_members, column_members
rowid int
iteration int
cluster int
order_num int
These tables contain the row and column members for each iteration and cluster. The element order_num
references an order_num
in its respective row_names
/column_names
table.
cluster_stats
rowid int
iteration int
cluster int
num_rows int
num_cols int
residual decimal
Stores the residual values, number of rows and columns for each iteration and cluster.
motif_infos
rowid int
iteration int
cluster int
seqtype text
motif_num int
evalue decimal
Basic information about a motif that cmonkey thinks is associated with a specific cluster.
meme_motif_sites
rowid int
motif_info_id int /* references motif_infos.rowid */
seq_name int
reverse boolean
start int
pvalue decimal
flank_left text
seq text
flank_right text
Detailed positional MEME information for a motif.
motif_annotations
rowid int
motif_info_id int /* references motif_infos.rowid */
iteration int
gene_num int
position int
reverse boolean
pvalue decimal
Positional information for a motif that was obtained from MAST.
motif_pssm_rows
rowid int
motif_info_id int /* references motif_infos.rowid */
iteration int
row int
a decimal
c decimal
g decimal
t decimal
Rows of the PSSM for a motif.
global_background
rowid int
subsequence text
pvalue decimal
If the run uses a global background file, this table stores the entries that were generated.
statstypes
rowid int
category text
name text
iteration_stats
rowid int
statstype int
iteration int
score decimal