Skip to content

hongnianwang/MultiStep_EMR_FS

Repository files navigation

Constructing an effective and practical multi-step feature selection framework for electronic medical records

Overview

This repository contains Jupyter notebooks implementing a multi-step feature selection framework for identifying significant variables in electronic medical records (EMR) data for clinical outcome prediction. This framework optimizes the selection process across multiple stages to improve model accuracy, stability, and interpretability.

The notebooks can be used to reproduce the figures in our paper, as outlined in the table below.

Notebook Description Figures
1_tableone_fs1.ipynb Data reading, feature distribution, initial feature selection -
2_fs1_10ml_metrics.ipynb ML model comparison based on initial feature selection Figure 2
3_feature_diversity.ipynb Feature importance ranking with 6 methods, visualizing differences across methods and samples Figure 3
4_fs2_trends.ipynb Analyzes trends in accuracy, similarity, and stability for 6 methods Figure 4
5_fs2_topk.ipynb Determines optimal top-K values based on accuracy, similarity, and stability trends Supplementary Figures S1 & S2
6_fs0123_metrics.ipynb Evaluates performance across four stages of the multi-step feature selection process Figure 5, Supplementary Figure S3
7_shap_analysis.ipynb SHAP analysis on the final model to interpret feature impact Figure 6, Supplementary Figure S4

Data

The project uses the following datasets:

  • MIMIC-III v1.4 (for ICU AKI prediction)
  • MIMIC-IV-ED v2.0 (for ED-to-ICU mortality prediction)

These datasets require credentialed access, and can be downloaded from PhysioNet.

Contact

Feel free to contact us or open an issue if you have any questions.

About

Multi-step feature selection framework for EMR data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published