Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enable Self-Supervised Learning in OpenFL #1297

Closed
porteratzo opened this issue Jan 22, 2025 · 1 comment
Closed

Feature Request: Enable Self-Supervised Learning in OpenFL #1297

porteratzo opened this issue Jan 22, 2025 · 1 comment

Comments

@porteratzo
Copy link
Collaborator

porteratzo commented Jan 22, 2025

Is your feature request related to a problem? Please describe.
Collaborators often face challenges in contributing to federated learning training due to the scarcity of labeled data. Generally, unlabeled data is much more abundant compared to labeled data. However, without a method to train with unlabeled data, this resource remains underutilized.

Describe the solution you'd like
Introduce Self-Supervised Learning(SSL) algorithms into OpenFL to enable training on unlabeled data. This can be achieved by creating workflows that utilize techniques such as Masked Autoencoders (MAE) or DinoV2. These algorithms can pretrain models on unlabeled data, which can then be fine-tuned on labeled data for specific tasks. The final model is expected to achieve better accuracy compared to models trained solely on labeled data.

@porteratzo
Copy link
Collaborator Author

porteratzo commented Jan 23, 2025

Proposed Integration: OpenFL Workspaces for Pretraining and Fine-Tuning

Objective:

To enhance the functionality of OpenFL by creating two distinct workspaces: one dedicated to pretraining and another for fine-tuning. This will enable users to understand and compare the benefits of SSL pretraining.

Dataset:

We propose using the BraTS2020 dataset, which is already approved for Intel. The dataset can be accessed here.

Features:

  1. Pretraining Workspace:

    • Users can run the pretraining workflow using the full BraTS2020 dataset.
    • This workspace will use the full dataset without using labels to build a robust initial model for fine tuning.
  2. Fine-Tuning Workspace:

    • Users can run the fine-tuning workflow using a subset of the BraTS2020 dataset.
    • This workspace will allow for more focused training to refine the pretrained model for specific tasks or datasets.
  3. Workflow Flexibility:

    • Users will have the option to run pretraining followed by fine-tuning in a seamless workflow.
    • Alternatively, users can choose to run only the fine-tuning process and compare the results with the pretrained model.
    • The user should also be able to configure dataset distribution using Dirichlet Distribution-Based Partitioning, This will allow the user to compare Independent and Identically Distributed data vs Non-Independent and Identically Distributed data, which is a more realistic distribution and where SSL pretraining excels.
  4. Result Comparison:

    • We will provide notebook to compare the final results of the fine-tuned model against the pretrained model.
    • This will help users evaluate the effectiveness of the SSL pretraining process and make informed decisions about model performance.

Benefits:

  • Efficiency: By leveraging SSL algorithms, users can utilize abundant unlabeled data to build robust initial models, reducing the dependency on scarce labeled data.
  • Performance Evaluation: Comparison tools will enable users to assess the impact of SSL pretraining on model performance, leading to better optimization and deployment strategies.

@securefederatedai securefederatedai locked and limited conversation to collaborators Jan 27, 2025
@porteratzo porteratzo converted this issue into discussion #1316 Jan 27, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant