Skip to content

Latest commit

 

History

History
127 lines (113 loc) · 8.9 KB

neurips-2024.md

File metadata and controls

127 lines (113 loc) · 8.9 KB

NeurIPS 2024

Meta Info

Homepage: https://neurips.cc/Conferences/2024

Paper list: https://neurips.cc/virtual/2024/papers.html?filter=titles

Acceptance Rate

  • Total: 15671
  • Accept: 25.8% (4037)
    • Poster: 23.3% (3650)
    • Spotlight: 2.1% (326)
    • Oral: 0.4% (61)

Papers

Large Language Models (LLMs)

  • LLM Inference
    • SGLang: Efficient Execution of Structured Language Model Programs [Paper] [Code] [arXiv]
      • Stanford & UC Berkeley
      • Co-design both the front-end language (programming interface) and the back-end runtime
      • SGLang Primitives
        • Enable the manipulation of prompts and generations
          • gen: call LLM generation
          • select: let the LLM choose the option with the highest probability from a list
          • extend or +=: extend the current prompt
        • Control of parallelism
          • fork: fork the current prompt state
          • join: rejoin the forked prompt states
      • Compilation optimizations
        • Code movement for improving prefix sharing
          • Doesn't strictly preserve the original computation —— aggressive
          • Prompt GPT-4 to re-order graph nodes
      • Runtime
        • RadixAttention
          • Utilize a radix tree (w/ efficient prefix search, reuse, insertion, eviction)
          • LRU eviction policy
        • Cache-aware scheduling → Increase the cache hit rate
          • Key idea: Sort the requests by matched prefix length
    • Efficient LLM Scheduling by Learning to Rank [Paper] [Code]
      • UCSD & THU & Snowflake & UC-Berkeley
      • Insight: it is possible to predict the relative ranks of output lengths in a batch of requests.
      • Develop a scheduler for LLM inference that can approximate the shortest-job-first (SJF) schedule better than existing approaches
  • Compound AI Systems
    • Are More LM Calls All You Need? Towards the Scaling Properties of Compound AI Systems [Paper] [Code]
      • Stanford & UC Berkeley & Princeton
      • Systematically study how the number of LM calls affects the performance of two natural inference strategy designs.
        • Vote: Aggregate LM responses via majority voting
        • Filter-Vote: Majority voting after filtering results with an LM
      • Insight
        • More LM calls lead to higher performance on “easy” queries, but lower performance on “hard” queries, and nonmonotone behavior can emerge when a task contains both types of queries.
      • An analytical scaling model to predict the performance of Vote and Filter-Vote systems and find the optimal number of LM calls to make.

Diffusion Models

  • Adapter Selection
    • Stylus: Automatic Adapter Selection for Diffusion Models [Paper] [Homepage] [Code]
      • UC Berkeley & CMU & Google DeepMind
      • Problem: how to match the prompt to a set of relevant adapters
      • Stylus
        • Select and automatically compose task-specific adapters based on a prompt's keywords
        • Three-stage approach
          1. Refiner: Leverage visual-language foundational models (VLM) to generate semantic descriptions of adapters then translate them into embeddings
          2. Retriever: Fetch the most relevant adapters over the entirety of the user’s prompt using cosine similarity
          3. Composer: Segment the prompt into tasks from a prompt’s keywords and assign retrieved adapters to tasks
      • StylusDocs
        • An adapter dataset consists of 75K LoRAs (sourced from Civitai) with pre-computed adapter embeddings
  • Inference
    • Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference [Paper]
      • HKUST & HKU & Salesforce AI Research & UIUC
      • Develop a general RTK (reverse transition kernel) framework that enables a more balanced subproblem decomposition
      • Propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems
    • Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity [Paper]
      • Stanford
      • Propose to divide the sampling process into $$O(1)$$ blocks with parallelizable Picard iterations within each block
  • Talking Face Video Generation
    • VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time [Paper] [Homepage]
      • MSRA
      • A framework to generate lifelike talking faces with appealing visual affective skills (VAS).
      • A diffusion-based holistic facial dynamics and head movement generation model that works in a face latent space.
      • Support the online generation of 512×512 videos at up to 40 FPS.
  • Facial Parts Swapping
    • FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images [Paper] [Code (coming...)]
      • Alibaba
      • Facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module
      • The consolidated feature is dispatched to the Addition-based Injection Module for fusion within the UNet of the diffusion model to create novel characters

Autoregressive Image Generation

  • Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction [Paper] [Code] [arXiv]
    • PKU & ByteDance
    • Best Paper Award
    • VAR: Visual Autoregressive Modeling
    • Redefine the autoregressive learning on images as coarse-to-fine “next-scale prediction” or “next-resolution prediction
    • Multi-scale token maps are autoregressively generated from coarse to fine scales (lower to higher resolutions), with parallel token generation within each scale
  • Autoregressive Image Generation without Vector Quantization [Paper] [Code] [arXiv]
    • MIT & Google DeepMind & THU
    • Propose to model the per-token probability distribution using a diffusion procedure
    • Define a Diffusion Loss function to model the per-token probability
    • Evaluated across a wide range of cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants

Text-to-Video Generation

  • Inference
    • Fast and Memory-Efficient Video Diffusion Using Streamlined Inference [Paper] [Code]
      • NEU
      • Streamlined Inference: Leverage the temporal and spatial properties of video diffusion models
      • Three core components
        • Feature Slicer: Partition input features into sub-features
        • Operator Grouping: Process each sub-feature with a group of consecutive operators
        • Step Rehash: Accelerate inference through skipping unnecessary steps
  • Evaluation
    • VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models [Paper] [Homepage] [Code] [Dataset]
      • UTS & ZJU
      • 1.67M unique text-to-video Prompts from real users.
      • 6.69M videos generated by four state-of-the-art diffusion models (Pika, VideoCraft2, Text2Video-Zero, ModelScope).
    • Evaluation of Text-to-Video Generation Models: A Dynamics Perspective [Paper] [Homepage] [Code]
      • UCAS & HIT & Adelaide & Baidu
        • Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content.
        • DEVIL: An evaluation protocol that centers on the dynamics dimension to evaluate T2V generation models
    • Boosting Text-to-Video Generative Model with MLLMs Feedback [Paper]
      • MSRA
        • Utilize Multimodal Large Language Models (MLLMs) to perform fine-grained video preference annotations → VideoPrefer (13.5K preference annotations)
        • VideoRM: The reward model for text-to-video alignment