vLLM bug #83

hqhQAQ · 2025-02-13T09:41:19Z

Current version of vLLM training may have bug.
For example, when GPU number and num_generations are both 4, current vLLM training requires that in each batch, all 4 samples correspond to the same question and image. However, in practice, although RepeatRandomSampler is used, these 4 samples may correspond to different questions and images.
(Correct me if I am wrong)

The text was updated successfully, but these errors were encountered:

TobiasLee · 2025-02-13T09:43:50Z

can you provide outputs for the different questions and images?
if so, the original GRPOTrainer would be wrong as well: huggingface/trl#2776

hqhQAQ · 2025-02-13T09:51:59Z

可能我搞错了, 我再看看😰

hqhQAQ closed this as completed Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM bug #83

vLLM bug #83

hqhQAQ commented Feb 13, 2025 •

edited

Loading

TobiasLee commented Feb 13, 2025

hqhQAQ commented Feb 13, 2025

vLLM bug #83

vLLM bug #83

Comments

hqhQAQ commented Feb 13, 2025 • edited Loading

TobiasLee commented Feb 13, 2025

hqhQAQ commented Feb 13, 2025

hqhQAQ commented Feb 13, 2025 •

edited

Loading