You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Collaborators often face challenges in contributing to federated learning training due to the scarcity of labeled data. Generally, unlabeled data is much more abundant compared to labeled data. However, without a method to train with unlabeled data, this resource remains underutilized.
Describe the solution you'd like
Introduce Self-Supervised Learning(SSL) algorithms into OpenFL to enable training on unlabeled data. This can be achieved by creating workflows that utilize techniques such as Masked Autoencoders (MAE) or DinoV2. These algorithms can pretrain models on unlabeled data, which can then be fine-tuned on labeled data for specific tasks. The final model is expected to achieve better accuracy compared to models trained solely on labeled data.
The text was updated successfully, but these errors were encountered:
Proposed Integration: OpenFL Workspaces for Pretraining and Fine-Tuning
Objective:
To enhance the functionality of OpenFL by creating two distinct workspaces: one dedicated to pretraining and another for fine-tuning. This will enable users to understand and compare the benefits of SSL pretraining.
Dataset:
We propose using the BraTS2020 dataset, which is already approved for Intel. The dataset can be accessed here.
Features:
Pretraining Workspace:
Users can run the pretraining workflow using the full BraTS2020 dataset.
This workspace will use the full dataset without using labels to build a robust initial model for fine tuning.
Fine-Tuning Workspace:
Users can run the fine-tuning workflow using a subset of the BraTS2020 dataset.
This workspace will allow for more focused training to refine the pretrained model for specific tasks or datasets.
Workflow Flexibility:
Users will have the option to run pretraining followed by fine-tuning in a seamless workflow.
Alternatively, users can choose to run only the fine-tuning process and compare the results with the pretrained model.
The user should also be able to configure dataset distribution using Dirichlet Distribution-Based Partitioning, This will allow the user to compare Independent and Identically Distributed data vs Non-Independent and Identically Distributed data, which is a more realistic distribution and where SSL pretraining excels.
Result Comparison:
We will provide notebook to compare the final results of the fine-tuned model against the pretrained model.
This will help users evaluate the effectiveness of the SSL pretraining process and make informed decisions about model performance.
Benefits:
Efficiency: By leveraging SSL algorithms, users can utilize abundant unlabeled data to build robust initial models, reducing the dependency on scarce labeled data.
Performance Evaluation: Comparison tools will enable users to assess the impact of SSL pretraining on model performance, leading to better optimization and deployment strategies.
Is your feature request related to a problem? Please describe.
Collaborators often face challenges in contributing to federated learning training due to the scarcity of labeled data. Generally, unlabeled data is much more abundant compared to labeled data. However, without a method to train with unlabeled data, this resource remains underutilized.
Describe the solution you'd like
Introduce Self-Supervised Learning(SSL) algorithms into OpenFL to enable training on unlabeled data. This can be achieved by creating workflows that utilize techniques such as Masked Autoencoders (MAE) or DinoV2. These algorithms can pretrain models on unlabeled data, which can then be fine-tuned on labeled data for specific tasks. The final model is expected to achieve better accuracy compared to models trained solely on labeled data.
The text was updated successfully, but these errors were encountered: