Training data setup #7

gopinath1509 · 2025-02-18T22:16:45Z

In this paper, it says the feature extraction model has been fine-tuned on a subset of COYO-700M, whats the size used since the entire dataset is in TBs? also it would be great if the code for extraction and setup of the subset could be released too!

kliyer-ai · 2025-02-22T13:52:08Z

Hey, the subset we refer to includes all images from COYO-700M with a short side of at least 512 px. You can obtain it by downloading COYO-700M using the img2dataset utility (https://github.com/rom1504/img2dataset) and setting the min_image_size flag to 512. However, note that we train for only 400 steps, meaning we use only a small portion of this subset. Since we randomly shuffle the shards at the start of training, I cannot provide the exact samples the model was trained on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data setup #7

Training data setup #7

gopinath1509 commented Feb 18, 2025 •

edited

Loading

kliyer-ai commented Feb 22, 2025

Training data setup #7

Training data setup #7

Comments

gopinath1509 commented Feb 18, 2025 • edited Loading

kliyer-ai commented Feb 22, 2025

gopinath1509 commented Feb 18, 2025 •

edited

Loading