Skip to content

MT-CNN is a CNN for Natural Language Processing and Information Extraction from free-form texts. BSEC group designed the model for information extraction from cancer pathology reports.

License

Notifications You must be signed in to change notification settings

CBIIT/NCI-DOE-Collab-Pilot3-Multitask-Convolutional-Neural-Network

Repository files navigation

Multitask Convolutional Neural Network (MT-CNN)

Description

MT-CNN is a CNN for natural language processing (NLP) and information extraction from free-form texts. This model extracts information from cancer pathology reports.

User Community

Data scientists interested in classifying free form texts (such as pathology reports, clinical trials, abstracts, and so on). 

Usability

Data scientists can train the provided untrained model on their own data, or use the trained model to classify the provided test samples. The provided scripts use pathology reports that has been downloaded from the Genomics Data Commons (GDC), converted to text format, cleaned, and preprocessed. Here is an example report.

Uniqueness

Classification of unstructured text is a classical problem in natural language processing. The community has developed state-of-the-art models like BERT, Bio-BERT, and Transformer. This model has the advantage of working on a relatively long report (that is, over 400 words) and shows robustness in terms of accuracy and speed with relatively small number of unstructured pathology reports. 

Components

The following components are in the Model and Data Clearinghouse (MoDaC):

Technical Details

Refer to this README.

Author

Biomedical Sciences, Engineering, and Computing (BSEC) Group; Computer Sciences and Engineering Division; Oak Ridge National Laboratory

About

MT-CNN is a CNN for Natural Language Processing and Information Extraction from free-form texts. BSEC group designed the model for information extraction from cancer pathology reports.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages