Skip to content

Latest commit

 

History

History
203 lines (98 loc) · 13.7 KB

Text-to-Image.MD

File metadata and controls

203 lines (98 loc) · 13.7 KB

Text-to-image Generation

AIGC Datasets

  • CommonCanvas CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images [PDF]

    Feature: 70 millions of high-quality images with high-quality synthetic captions

  • JDB JourneyDB: A Benchmark for Generative Image Understanding [PDF, Page]

    Feature: 4 millions of Midjourney images

  • DiffusionDB DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models [PDF, Page]

    Feature: 14 millions of Stable Diffusion images

Diffusion-based

*[ICML 2021; OpenAI ] ---GLIDE--- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [PDF, Code]

[arxiv 2022; Microsoft] Vector Quantized Diffusion Model for Text-to-Image Synthesis [PDF, Code]

[CVPR 2022; SUNY] Towards Language-Free Training for Text-to-Image Generation [PDF, Code]

[ECCV 2022; UIUC ] Compositional Visual Generation with Composable Diffusion Models [PDF, Code]

[arxiv 2022; ByteDance] CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP [PDF, Code]

*[arxiv 2022; OpenAI ] ---DALL-E2--- Hierarchical Text-Conditional Image Generation with CLIP Latents [PDF, Code]

*[CVPR 2022] ---LDM--- High-Resolution Image Synthesis with Latent Diffusion Models [PDF, Code]

*[arxiv 2022; Goole] ---Imagen--- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [PDF, Code]

[arxiv 2023.01] Simple diffusion: End-to-end diffusion for high resolution images [PDF ]

[arxiv 2023.07]SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesi [PDF, Page]

[arxiv 2023.09]Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack [PDF,Page]

[tech report] DALLE-3: Improving Image Generation with Better Captions [PDF,Page]

[arxiv 2023.10]Matryoshka Diffusion Models [PDF]

[arxiv 2023.12]Kandinsky-3: Text-to-image diffusion model [PDF,Page]

[arxiv 2024.01]Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support [PDF]

[arxiv 2024.02]Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation [PDF,Page]

[arxiv 2024.03]SD3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [PDF]

[arxiv 2024.03]PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation [PDF,Page]

[arxiv 2024.03]CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion [PDF]

[arxiv 2024.03]Multistep Consistency Models [PDF]

[arxiv 2024.04]CosmicMan: A Text-to-Image Foundation Model for Humans [PDF,Page]

[arxiv 2024.05] Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding [PDF,Page]

[arxiv 2024.05] Improving the Training of Rectified Flows[PDF,Page]

[arxiv 2024.06]Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT [PDF,Page]

[arxiv 2024.7]Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis [PDF,Page]

[arxiv 2024.08] Imagen 3[PDF]

[arxiv 2024.10] Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer [PDF,Page]

[arxiv 2024.08] [PDF,Page]

GAN/VAE/Transformer-based

[ICML 2021; OpenAI ] Zero-Shot Text-to-Image Generation [PDF, Code 3]

[CVPR 2021; Google ] Cross-Modal Contrastive Learning for Text-to-Image Generation [PDF, Code]

[KDD, 2021; Alibaba ] ---M6--- M6 : A Chinese Multimodal Pretrainer [PDF, Code]

[arxiv 2021; Baidu] ---ERNIE-ViLG--- ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation [PDF, Code]

[ECCV 2022] ---DT2I--- DT2I: Dense Text-to-Image Generation from Region Descriptions [PDF, Code]

*[arxiv 2022; Meta ] ---Make-a-scene--- Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [PDF, Code 3]

*[arxiv 2022; Google] ---Parti--- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [PDF, Code]

[arxiv 2022; Tsinghua ] ---CogView2--- CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers [PDF, Code]

[ECCV 2022; Microsoft] ---NÜWA-- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion [PDF, code ]

[NIPS 2022; Microsoft] ---NÜWA-Infinity-- NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis [PDF, code ]

*[arxiv 2023.1; Google] ---Muse--- Muse: Text-To-Image Generation via Masked Generative Transformers [PDF, Page]

[arxiv 2023.1]Attribute-Centric Compositional Text-to-Image Generation [PDF]

[arxiv 2023.01]---StyleGAN-T--- StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis [PDF, Page]

[arxiv 2023.03]Scaling up GANs for Text-to-Image Synthesis[PDF, Page]

[arxiv 2023.10]PIXART-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis [PDF, Page]

[arxiv 2024.08]VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling [PDF,Page]

[arxiv 2024.09]MaskBit: Embedding-free Image Generation via Bit Token [PDF,Page]

[arxiv 2024.10] [PDF,Page]

autoregressive

[arxiv 2023.07]Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning [PDF]

[arxiv 2024.04]Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [PDF, Page]

[arxiv 2024.06]Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation[PDF, Page]

[arxiv 2024.06]Autoregressive Image Generation without Vector Quantization [PDF,]

[arxiv 2024.07]MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis [PDF,Page]

[arxiv 2024.08]Scalable Autoregressive Image Generation with Mamba [PDF,Page]

[arxiv 2024.10] Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis[PDF,Page]

[arxiv 2024.10] DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation [PDF]

[arxiv 2024.10] Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens[PDF]

[arxiv 2024.10] [PDF,Page]

Generation & Super-resolution

[TPAMI 2022; Google ] Image Super-Resolution via Iterative Refinement [PDF, Code]

[CVPR 2022; POSTECH ]Autoregressive Image Generation using Residual Quantization [PDF, Code]

[SIGGRAPH 2022; Goolge ] ---Palette--- Palette: Image-to-Image Diffusion Models[PDF, Code]

[arxiv 2022; Google] Cascaded Diffusion Models for High Fidelity Image Generation[PDF, Code]

[arxiv 2023.06]Designing a Better Asymmetric VQGAN for StableDiffusion [PDF, code]

Scene

[arxiv 2022.12]Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models [PDF]

[arxiv 2022.12]Benchmarking Spatial Relationships in Text-to-Image Generation [PDF]

privacy

[arxiv 2023.02, DeepMind]Differentially Private Diffusion Models Generate Useful Synthetic Images [PDF]

Transformer Related

*[ICLR 2022, Google]---ViT-VQGAN--- Vector-quantized Image Modeling with Improved VQGAN [PDF]

*[CVPR 2021, HEIDELBERG] ---VQGAN--- Taming transformers for high-resolution image synthesis[PDF, Page, code]

*[arxiv 2022.02]MaskGIT: Masked Generative Image Transformer [PDF]

Diffusion related

*[arxiv 2022.12] Scalable Diffusion Models with Transformers [PDF, Page]

[arxiv 2024.01]Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers [PDF,Page]

Benchmark

[arxiv 2023.10]DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design [PDF, Page]

Study

[arxiv 2023.02] A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning [PDF]

[arxiv 2023.02] Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension [PDF]

[arxiv 2023.02]Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness [PDF]

[arxiv 2023.02] Unsupervised Discovery of Semantic Latent Directions in Diffusion Models [PDF]

[arxiv 2023.03]A Prompt Log Analysis of Text-to-Image Generation Systems [PDF]

[arxiv 2023.10]A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation[PDF]

[arxiv 2023.10]Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation [PDF,Page]

Image caption

[arxiv 2023.07]SITTA: A Semantic Image-Text Alignment for Image Captioning [PDF, Page]