-
CommonCanvas CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images [PDF]
Feature: 70 millions of high-quality images with high-quality synthetic captions
-
JDB JourneyDB: A Benchmark for Generative Image Understanding [PDF, Page]
Feature: 4 millions of Midjourney images
-
DiffusionDB DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models [PDF, Page]
Feature: 14 millions of Stable Diffusion images
*[ICML 2021; OpenAI ] ---GLIDE--- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [PDF, Code]
[arxiv 2022; Microsoft] Vector Quantized Diffusion Model for Text-to-Image Synthesis [PDF, Code]
[CVPR 2022; SUNY] Towards Language-Free Training for Text-to-Image Generation [PDF, Code]
[ECCV 2022; UIUC ] Compositional Visual Generation with Composable Diffusion Models [PDF, Code]
[arxiv 2022; ByteDance] CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP [PDF, Code]
*[arxiv 2022; OpenAI ] ---DALL-E2--- Hierarchical Text-Conditional Image Generation with CLIP Latents [PDF, Code]
*[CVPR 2022] ---LDM--- High-Resolution Image Synthesis with Latent Diffusion Models [PDF, Code]
*[arxiv 2022; Goole] ---Imagen--- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [PDF, Code]
[arxiv 2023.01] Simple diffusion: End-to-end diffusion for high resolution images [PDF ]
[arxiv 2023.07]SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesi [PDF, Page]
[arxiv 2023.09]Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack [PDF,Page]
[tech report] DALLE-3: Improving Image Generation with Better Captions [PDF,Page]
[arxiv 2023.10]Matryoshka Diffusion Models [PDF]
[arxiv 2023.12]Kandinsky-3: Text-to-image diffusion model [PDF,Page]
[arxiv 2024.01]Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support [PDF]
[arxiv 2024.02]Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation [PDF,Page]
[arxiv 2024.03]SD3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [PDF]
[arxiv 2024.03]PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation [PDF,Page]
[arxiv 2024.03]CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion [PDF]
[arxiv 2024.03]Multistep Consistency Models [PDF]
[arxiv 2024.04]CosmicMan: A Text-to-Image Foundation Model for Humans [PDF,Page]
[arxiv 2024.05] Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding [PDF,Page]
[arxiv 2024.05] Improving the Training of Rectified Flows[PDF,Page]
[arxiv 2024.06]Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT [PDF,Page]
[arxiv 2024.7]Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis [PDF,Page]
[arxiv 2024.08] Imagen 3[PDF]
[arxiv 2024.10] Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer [PDF,Page]
[ICML 2021; OpenAI ] Zero-Shot Text-to-Image Generation [PDF, Code 3]
[CVPR 2021; Google ] Cross-Modal Contrastive Learning for Text-to-Image Generation [PDF, Code]
[KDD, 2021; Alibaba ] ---M6--- M6 : A Chinese Multimodal Pretrainer [PDF, Code]
[arxiv 2021; Baidu] ---ERNIE-ViLG--- ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation [PDF, Code]
[ECCV 2022] ---DT2I--- DT2I: Dense Text-to-Image Generation from Region Descriptions [PDF, Code]
*[arxiv 2022; Meta ] ---Make-a-scene--- Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [PDF, Code 3]
*[arxiv 2022; Google] ---Parti--- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [PDF, Code]
[arxiv 2022; Tsinghua ] ---CogView2--- CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers [PDF, Code]
[ECCV 2022; Microsoft] ---NÜWA-- NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion [PDF, code ]
[NIPS 2022; Microsoft] ---NÜWA-Infinity-- NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis [PDF, code ]
*[arxiv 2023.1; Google] ---Muse--- Muse: Text-To-Image Generation via Masked Generative Transformers [PDF, Page]
[arxiv 2023.1]Attribute-Centric Compositional Text-to-Image Generation [PDF]
[arxiv 2023.01]---StyleGAN-T--- StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis [PDF, Page]
[arxiv 2023.03]Scaling up GANs for Text-to-Image Synthesis[PDF, Page]
[arxiv 2023.10]PIXART-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis [PDF, Page]
[arxiv 2024.08]VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling [PDF,Page]
[arxiv 2024.09]MaskBit: Embedding-free Image Generation via Bit Token [PDF,Page]
[arxiv 2023.07]Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning [PDF]
[arxiv 2024.04]Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [PDF, Page]
[arxiv 2024.06]Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation[PDF, Page]
[arxiv 2024.06]Autoregressive Image Generation without Vector Quantization [PDF,]
[arxiv 2024.07]MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis [PDF,Page]
[arxiv 2024.08]Scalable Autoregressive Image Generation with Mamba [PDF,Page]
[arxiv 2024.10] Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis[PDF,Page]
[arxiv 2024.10] DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation [PDF]
[arxiv 2024.10] Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens[PDF]
[TPAMI 2022; Google ] Image Super-Resolution via Iterative Refinement [PDF, Code]
[CVPR 2022; POSTECH ]Autoregressive Image Generation using Residual Quantization [PDF, Code]
[SIGGRAPH 2022; Goolge ] ---Palette--- Palette: Image-to-Image Diffusion Models[PDF, Code]
[arxiv 2022; Google] Cascaded Diffusion Models for High Fidelity Image Generation[PDF, Code]
[arxiv 2023.06]Designing a Better Asymmetric VQGAN for StableDiffusion [PDF, code]
[arxiv 2022.12]Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models [PDF]
[arxiv 2022.12]Benchmarking Spatial Relationships in Text-to-Image Generation [PDF]
[arxiv 2023.02, DeepMind]Differentially Private Diffusion Models Generate Useful Synthetic Images [PDF]
*[ICLR 2022, Google]---ViT-VQGAN--- Vector-quantized Image Modeling with Improved VQGAN [PDF]
*[CVPR 2021, HEIDELBERG] ---VQGAN--- Taming transformers for high-resolution image synthesis[PDF, Page, code]
*[arxiv 2022.02]MaskGIT: Masked Generative Image Transformer [PDF]
*[arxiv 2022.12] Scalable Diffusion Models with Transformers [PDF, Page]
[arxiv 2024.01]Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers [PDF,Page]
[arxiv 2023.10]DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design [PDF, Page]
[arxiv 2023.02] A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning [PDF]
[arxiv 2023.02] Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension [PDF]
[arxiv 2023.02]Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness [PDF]
[arxiv 2023.02] Unsupervised Discovery of Semantic Latent Directions in Diffusion Models [PDF]
[arxiv 2023.03]A Prompt Log Analysis of Text-to-Image Generation Systems [PDF]
[arxiv 2023.10]A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation[PDF]
[arxiv 2023.10]Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation [PDF,Page]
[arxiv 2023.07]SITTA: A Semantic Image-Text Alignment for Image Captioning [PDF, Page]