Constructing an effective and practical multi-step feature selection framework for electronic medical records

Overview

This repository contains Jupyter notebooks implementing a multi-step feature selection framework for identifying significant variables in electronic medical records (EMR) data for clinical outcome prediction. This framework optimizes the selection process across multiple stages to improve model accuracy, stability, and interpretability.

The notebooks can be used to reproduce the figures in our paper, as outlined in the table below.

Notebook	Description	Figures
1_tableone_fs1.ipynb	Data reading, feature distribution, initial feature selection	-
2_fs1_10ml_metrics.ipynb	ML model comparison based on initial feature selection	Figure 2
3_feature_diversity.ipynb	Feature importance ranking with 6 methods, visualizing differences across methods and samples	Figure 3
4_fs2_trends.ipynb	Analyzes trends in accuracy, similarity, and stability for 6 methods	Figure 4
5_fs2_topk.ipynb	Determines optimal top-K values based on accuracy, similarity, and stability trends	Supplementary Figures S1 & S2
6_fs0123_metrics.ipynb	Evaluates performance across four stages of the multi-step feature selection process	Figure 5, Supplementary Figure S3
7_shap_analysis.ipynb	SHAP analysis on the final model to interpret feature impact	Figure 6, Supplementary Figure S4

Data

The project uses the following datasets:

MIMIC-III v1.4 (for ICU AKI prediction)
MIMIC-IV-ED v2.0 (for ED-to-ICU mortality prediction)

These datasets require credentialed access, and can be downloaded from PhysioNet.

Contact

Feel free to contact us or open an issue if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
1-tableone_fs1.ipynb		1-tableone_fs1.ipynb
2-fs1_10ml_metrics.ipynb		2-fs1_10ml_metrics.ipynb
3-feature_diversity.ipynb		3-feature_diversity.ipynb
4-fs2_trends.ipynb		4-fs2_trends.ipynb
5-fs2_topk.ipynb		5-fs2_topk.ipynb
6-fs0123_metrics.ipynb		6-fs0123_metrics.ipynb
7-shap.ipynb		7-shap.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constructing an effective and practical multi-step feature selection framework for electronic medical records

Overview

Data

Contact

About

Releases

Packages

Languages

License

hongnianwang/MultiStep_EMR_FS

Folders and files

Latest commit

History

Repository files navigation

Constructing an effective and practical multi-step feature selection framework for electronic medical records

Overview

Data

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages