Replies: 1 comment 3 replies
-
I think that's worthy of implementation. Instead of text tokens, we can feed GPT2 embeddings from the last layer. It might be even smarter with text than original DALL-E, because GPT2 was trained on a large amount of text. DALL-E has to rediscover everything about language from just the captions, which are small and fragmented. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
GPT-2 Extra Large model (1775M Parameters) + DALL-E PyTorch implantation.
(Or a Fine-Tuned 355M Model)
Do you think that would be a feasible idea?
It will be a dumbed down version of DALL-E for sure, but it will open up much more possibilities.
Beta Was this translation helpful? Give feedback.
All reactions