Skip to content

Help needed for fine-tuning training params #780

Answered by craffel
sbmaruf asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, for the remaining questions:

  1. Adafactor
  2. No
  3. Batch size makes almost no difference; what matters is the total number of tokens seen over the course of fine-tuning.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@sbmaruf
Comment options

@craffel
Comment options

@sbmaruf
Comment options

Answer selected by sbmaruf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants