Skip to content

A machine learning computational framework for the identification of functional long noncoding RNAs by integrating multi-omics network

Notifications You must be signed in to change notification settings

chunquanlipathway/FunlncModel

Repository files navigation

FunlncModel

A machine learning computational framework for the identification of functional long noncoding RNAs by integrating multi-omics network

Accumulating evidence indicates that long non-coding RNAs (lncRNAs) play important roles in molecular and cellular biology. Although many algorithms have been developed to reveal their associations with complex diseases by using downstream targets, the upstream (epi)genetic regulatory information has not been sufficiently leveraged to predict the function of lncRNAs in various biological processes. Therefore, we present FunlncModel, a machine learning-based interpretable computational framework, which aims to screen out functional lncRNAs by integrating a large number of (epi)genetic features and functional genomic features from their upstream/downstream multi-omic regulatory networks. We adopted the random forest method to mine nearly 60 features in three categories from over 2,000 datasets across 11 data types, including transcription factors (TFs), histone modifications, typical enhancers (TEs), super-enhancers (SEs), methylation sites, mRNAs, etc. FunlncModel outperformed alternative methods for classification performance in human embryonic stem cell (hESC) (0.95 AUROC and 0.97 AUPRC). It could not only infer the most known lncRNAs that influence the states of stem cells, but also discover novel high-confidence functional lncRNAs (HCFun_lncs). We extensively validated FunlncModel's efficacy by up to 27 cancer-related functional prediction tasks, which involved multiple cancer cell growth processes and cancer hallmarks. Meanwhile, we have also found that (epi)genetic regulatory features, such as TFs and histone modifications, serve as strong predictors for revealing the function of lncRNAs. Overall, FunlncModel is a strong and stable prediction model for identifying functional lncRNAs in specific cellular contexts. image

###If you want to train your own model, you can refer to the code 'Train_model_code.R', which adds the appropriate comments. In the training process, it is necessary to load the caret R package and prepare a feature matrix [refer to 'Feature_matrix(HESC).csv']. The Code and input examples could be found in this project and https://bio.liclab.net/FunlncModel/download.php.

###If you want to use trained model to predict your own data, you can refer to the code 'Prediction_code.R', which adds the appropriate comments. In the process, it is necessary to load the caret R package, trained-model.Rdata and prepare your own feature matrix [refer to 'Feature_matrix(HESC).csv']. The trained model in this study could be found in https://bio.liclab.net/FunlncModel/download.php.

###The FunlncModel online server is freely available at https://bio.liclab.net/FunlncModel/. The guidance for the utilization of the FunIncModel web server see https://bio.liclab.net/FunlncModel/tutorials.php.

About

A machine learning computational framework for the identification of functional long noncoding RNAs by integrating multi-omics network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages