-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining Two Datasets in YOLO #754
Comments
👋 Hello @jaisenbe58r, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix. If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response. We try to respond to all issues as promptly as possible. Thank you for your patience! |
@jaisenbe58r Hello! |
Hello @sergiuwaxman!, That's unfortunate to hear. Thank you for your clarification and assistance. Best regards, |
Hello @jaisenbe58r, Thank you for your understanding. To combine your datasets effectively, you can create a new dataset directory that includes the images and labels from both datasets. Here's a step-by-step guide to help you structure your combined dataset:
# File: combined_dataset.yaml
path: ../datasets/combined_dataset # root directory of the combined dataset
train: images/train # train images (relative to 'path')
val: images/val # val images (relative to 'path')
# Number of classes
nc: <number_of_classes>
# Class names
names: ['class1', 'class2', 'class3', ...]
from ultralytics.hub import check_dataset
check_dataset("path/to/combined_dataset.zip", task="detect")
By following these steps, you can seamlessly combine your datasets and utilize them for training and validation. If you encounter any issues or need further assistance, feel free to reach out. We're here to help! |
Hello @pderrenger, Thank you for the detailed explanation and the step-by-step guide on combining the datasets. It is very helpful. I have a follow-up question: Is there no way to structure the # File: combined_dataset.yaml
path: ../datasets/combined_dataset # root directory of the combined dataset
train:
- dataset1/images/train
- dataset2/images/train
val:
- dataset1/images/val
- dataset2/images/val
# Number of classes
nc: <number_of_classes>
# Class names
names: ['class1', 'class2', 'class3', ...] Would this approach not be possible or supported by the Ultralytics framework? It would be more convenient to maintain the original datasets separately. Thank you again for your assistance! Best regards, |
Hello @jaisenbe58r, Thank you for your kind words and for the follow-up question! Currently, the Ultralytics framework does not support specifying multiple paths for the To maintain the original datasets separately while still combining them for training, you can use symbolic links (symlinks) to create a unified directory structure without duplicating the data. Here's how you can do it:
# Create symlinks for training images
ln -s /path/to/datasets/dataset1/images/train/* /path/to/combined_dataset/images/train/
ln -s /path/to/datasets/dataset2/images/train/* /path/to/combined_dataset/images/train/
# Create symlinks for validation images
ln -s /path/to/datasets/dataset1/images/val/* /path/to/combined_dataset/images/val/
ln -s /path/to/datasets/dataset2/images/val/* /path/to/combined_dataset/images/val/
# Create symlinks for training labels
ln -s /path/to/datasets/dataset1/labels/train/* /path/to/combined_dataset/labels/train/
ln -s /path/to/datasets/dataset2/labels/train/* /path/to/combined_dataset/labels/train/
# Create symlinks for validation labels
ln -s /path/to/datasets/dataset1/labels/val/* /path/to/combined_dataset/labels/val/
ln -s /path/to/datasets/dataset2/labels/val/* /path/to/combined_dataset/labels/val/
# File: combined_dataset.yaml
path: ../datasets/combined_dataset # root directory of the combined dataset
train: images/train # train images (relative to 'path')
val: images/val # val images (relative to 'path')
# Number of classes
nc: <number_of_classes>
# Class names
names: ['class1', 'class2', 'class3', ...] This way, you can maintain the original datasets separately while creating a unified structure for training purposes. If you have any further questions or need additional assistance, feel free to ask. We're here to help! |
Hello @pderrenger, Thank you for your detailed response and for clarifying the limitations of the current Ultralytics framework. I appreciate your suggestion to use symbolic links to create a unified directory structure without duplicating data. This solution effectively addresses my concern about maintaining the original datasets separately while combining them for training. I will proceed with creating the symlinks and updating the YAML file as you recommended. Thank you once again for your assistance and support! Best regards, |
Hello @jaisenbe58r, You're very welcome! I'm glad to hear that the suggestion to use symbolic links was helpful and addresses your concern about maintaining the original datasets separately. If you encounter any issues while setting up the symlinks or have any further questions as you proceed, please don't hesitate to reach out. We're here to support you every step of the way. Additionally, if you haven't already, please ensure that you're using the latest versions of the Ultralytics packages to benefit from the latest features and bug fixes. This can help avoid any potential issues that may have already been resolved in recent updates. For any future issues or questions, providing a minimum reproducible example can greatly assist us in diagnosing and resolving your concerns more efficiently. You can find more information on creating such examples here. Thank you for your engagement and for being a part of the YOLO community. Happy training! |
Search before asking
Question
Hello Ultralytics Team,
I am currently working on a project using YOLO and I have two separate datasets that I would like to combine for training and validation purposes. I would like to know if it is possible to combine these datasets, and if so, how should I structure my
data.yml
file to properly reference both datasets?Example Details:
/path/to/datasets/dataset1/images/
/path/to/datasets/dataset1/labels/
/path/to/datasets/dataset2/images/
/path/to/datasets/dataset2/labels/
Folder Structure:
Proposed
data.yml
Configuration:Could you please confirm if this setup is correct and whether it is possible to combine datasets in this manner? If there are any additional steps or considerations I need to take into account, please let me know.
Thank you for your assistance!
Best regards,
Jaime
Additional
No response
The text was updated successfully, but these errors were encountered: