Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing #8998

Draft
wants to merge 102 commits into
base: main
Choose a base branch
from

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoz tolgacangoz commented Jul 28, 2024

Thanks for the opportunity to fix #6407!

AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy.

Paper: AnyText: Multilingual Visual Text Generation And Editing
Repository: https://github.com/tyxsspa/AnyText
Hugging Face Space: modelscope/AnyText

anytext
anytext

TODOs:
AuxiliaryLatentModule
AnyTextControlNetModel -> Inherited and adapted from ControlNetModel. The only difference is that using Gylph Block, Position Block, and Fuse Block instead of input_hint_block or controlnet_cond_embedding from an ordinary ControlNet -ControlNetConditioningEmbedding is different. I deactivated the ControlNetConditioningEmbedding part and moved the new blocks into AuxiliaryLatentModule just to comply with the Figure.
AnyTextPipeline -> Adapted from StableDiffusionControlNetPipeline.
TextEmbeddingModule -> Replaces the encode_prompt() function. I may transfer what TextEmbeddingModule does into encode_prompt().
convert_anytext_to_diffusers.py
⏳ Verify outputs with the original implementation
⏳ Finish HF integration & upload converted checkpoints to HF
README.md
⬜ Make it as simple as possible, but not simpler

Open In Colab

This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 30, 2024
@github-actions github-actions bot removed the stale Issues that haven't received updates label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AnyText: Multilingual Visual Text Generation And Editing
3 participants